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What’s  in  Your  Toolbox? 


If  you  were  to  start  a  discussion  on  software  tools,  most  people’s  initial  frame  of  ref¬ 
erence  would  probably  be  tools  such  as  modeling  languages,  compilers,  word  proces¬ 
sors,  project  management  tools,  etc.  While  these  are  all  important  tools,  I  hope  that  this 
issue  of  CROSSTALK  will  expand  the  frame  of  reference  for  our  readers.  When  con¬ 
sidering  software  tools  that  may  help  with  software  development  and  acquisition,  pro¬ 
jects  will  realize  more  benefit  if  the  project  team  expands  its  consideration  of  tools  to 
include  helpful  processes  and  techniques  in  addition  to  software  products  such  as  those 
listed  above. 

Over  the  years,  CROSSTALK  has  shared  many  tools  that  apply  to  most  aspects  of  software 
development  and  acquisition;  some  examples  that  readily  come  to  mind  include  updating  lega¬ 
cy  code,  working  with  people,  information  security,  architectures,  and  processes  such  as  those 
promoted  in  the  Capability  Maturity  Model®  (CMM®)  Integration  and  ISO  9000.  These  are  all 
great  tools  for  improving  the  quality  of  software  projects,  the  efficiency  in  developing  software, 
and  the  ability  to  accurately  predict  the  cost  and  schedule  of  delivery.  The  CROSSTALK  staff 
developed  this  special  issue  to  highlight  the  idea  that  tools  for  developing  software  are  more 
than  just  software  products.  Some  of  the  most  useful  software  tools  are  the  ones  most  often 
neglected  by  software  developers,  yet  much  has  been  expended  over  the  past  several  years  to 
educate  developers  (and  now  acquirers)  about  these  tools  and  their  benefits. 

We  begin  this  issue  of  CROSSTALK  with  an  article  from  Dr.  Alistair  Cockburn  that  truly 
stresses  this  point.  In  What  the  Agile  Toolbox  Contains ,  Cockburn  discusses  numerous  tools  from 
all  angles  of  this  discussion.  If  you’re  not  involved  with  agile  software  development,  you’ll  see 
many  of  these  tools  apply  to  other  development  methods  as  well.  I  recommend  this  article  for 
everyone. 

In  A  Revolutionary  Use  of  COTS  in  a  Submarine  Sonar  System ,  Capt.  Gib  Kerr  and  Robert  W. 
Miller  share  the  success  they  have  achieved  thanks  to  the  use  of  commercial  off-the-shelf 
(COTS)  software.  As  discussed,  this  effort  was  not  without  its  drawbacks,  but  the  benefits  out¬ 
weighed  the  problems. 

Next,  Dr.  Mikhail  J.  Atallah,  Eric  D.  Bryant,  and  Dr.  Martin  R.  Stytz  discuss  various 
approaches  to  anti-tamper  technologies  in  A  Survey  of  Anti-Tamper  Technologies.  This  discussion 
emcompasses  introductory  descriptions  of  recommended  technologies,  including  their  benefits 
and  their  drawbacks. 

Safety  critical  software  presents  additional  challenges  for  the  developers.  In  Safety  Analysis  as 
a  Software  Tool,  Blair  T.  Whatcott  discusses  the  basic  steps  for  safety  analysis  and  reminds  the 
readers  that  safety  analysis  must  be  performed  at  the  system  level,  since  many  hazards  exist  at 
interfaces  between  system  components. 

In  Three  Essential  Tools  for  Stable  Development,  Andy  Hunt  and  Dave  Thomas  share  their  expe¬ 
rience  that  configuration  management,  unit  testing,  and  automation  are  key  to  mitigating  a 
majority  of  the  common  problems  experienced  by  software  developers. 

We  conclude  this  issue  with  a  high-level  discussion  on  making  measures  more  useful  with 
David  B.  Putman’s  Your  Quality  Data  Is  Talking  —  Are  You  Eistening?  Putman  was  one  of  the  key 
people  who  helped  Hill  Air  Force  Base’s  Software  Engineering  Division  receive  a  Level  5  rating 
on  the  CMM.  In  this  article,  he  discusses  some  of  the  thought  processes  that  helped  so  much 
with  the  measurement  efforts. 

You  might  notice  that  there  are  no  supporting  sections  this  month.  The  reason  for  this  is 
that  the  CROSSTALK  staff  believes  all  of  the  ideas  discussed  in  these  articles  should  be  con¬ 
sidered  useful  tools  that  support  software  development.  We  hope  CROSSTALK  is  also  in  your 
software  toolbox. 


Elizabeth  Starrett 

Associate  Publisher 
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What  the  Agile  Toolbox  Contains 


Dr.  Alistair  Cockburn 
Humans  and  Technology 

The  agile  development  community  is  noted  for  scorning  Computer-Hided  Software  Engineering  modeling  and  Gantt  project 
scheduling  tools  (among  others ),  but  what  has  it  replaced  them  with ?  Conducting  a  survey  of  agile  teams  for  tools  they  say  help 
produce  better  software  quicker ;  this  author  found  they  used  a  cross-disciplinary  set  of  mental \  social \  environmental \  mechan¬ 
ical,  and  process  tools,  in  addition  to  a  carefully  selected  set  of  software-based  tools.  This  list  of  tools  can  help  your  organisa¬ 
tion  prepare  for  the  tools  —  human  resource,  facilities,  software,  and  non-software  —  that  will  be  requested  and  used  by  the 
team  starting  to  adopt  the  agile  approach. 


The  word  tool  usually  brings  to  mind  a 
physical  or  software  device.  However, 
agile  software  development  teams  have 
removed  much  of  the  usually  mentioned 
hi-tech  development  tools  from  their 
repertoire.  Thus,  in  conducting  a  survey  of 
tools  agile  teams  say  produce  better  soft¬ 
ware  sooner,  I  had  to  be  more  general  in 
considering  what  might  be  regarded  as  a 
tool  when  asking,  “What  does  the  agile 
toolbox  contain?” 

The  set  of  tools  that  agile  develop¬ 
ment  teams  consider  part  of  their  toolbox 
is  very  broad,  ranging  in  purpose  across 
hiring,  collaborating,  communicating,  managing, 
developing,  etc.  Their  tools  also  range  in 
form  across  environmental  (such  as  office 
layout),  social,  physical,  process,  thinking,  and 
computer-based. 

For  the  survey,  I  seeded  a  discussion 
about  tools  and  posted  requests  for  input 
to  four  agile  development  discussion 
groups  [1,  2,  3,  4].  Originally,  I  intended  to 
describe  a  few  of  the  more  unusual  items 
on  the  resulting  list.  However,  the  toolset 
that  arrived  back  was  so  interesting  when 
considered  as  a  whole  that  I  chose  to  show 
it  in  its  entirety. 

When  people  are  deciding  whether  to 
use  an  agile  development  approach  on  an 
upcoming  project,  they  can  work  through 
this  list  together,  considering  the  implica¬ 
tions  of  each  item  on  their  budget  and 
work  habits.  Then  it  will  not  be  such  a  sur¬ 
prise  when  the  team  starts  to  rearrange  the 
cubicles  and  request  different  furniture, 
post  bits  of  paper  all  over  the  wall,  or  ask 
to  have  job  applicants  co-program  with 
them  for  a  morning. 

This  article  is  arranged  in  the  following 
sections: 

•  A  brief  description  of  agile  develop¬ 
ment  with  references  for  further  read¬ 
ing  and  a  short  description  of  terms 
that  will  show  up  in  the  tool  lists. 

•  The  tools  grouped  by  the  purpose  they 
support. 


•  The  tools  itemized  by  form. 

•  Reflection  on  this  list  as  a  whole. 

Agile  Development  Acronyms 
and  KeyWords 

Generally  speaking,  teams  using  the  agile 
development  approach  focus  strongly  on 
collaboration  and  rapid  feedback  from 
running  code. 

“Although  physical 
proximity ;  whiteboards, 
poster  sheets,  index 
cards,  and  sticky  notes 
are  still  the  dominant 
tools  used  in 
collaboration,  people 
started  finding  and 
inventing  online 
collaboration  tools  as 
agile  development  moved 
into  distributed 
development/ 9 

Collaboration  is  expected  not  only 
within  the  development  team  but  also 
across  organizational  boundaries,  with 
expert  users  and  project  sponsors. 
Collaboration  involves  group  workshop 
techniques  for  project  planning,  require¬ 
ments  gathering  and  design,  and  program¬ 
ming  in  pairs  or  in  close  proximity  such  as 
in  a  war-room  setting. 

Collaboration  also  involves  using  infor¬ 


mation  radiators  [5]  —  large  displays  showing 
up-to-date  information  placed  in  public 
for  people  to  see  whenever  they  pass  by. 
Information  radiators  are  used  in  work¬ 
shops,  in  the  war-room  setting,  and 
between  continents  to  keep  people  in  sync 
on  their  goal  and  their  state. 

Rapid  feedback  is  based  on  running, 
tested,  integrated  system  features,  or  RTF 
[6].  The  project  plan  is  constructed,  and 
progress  is  measured  in  terms  of  the 
steadily  increasing  set  of  integrated  fea¬ 
tures.  The  team  seeks  early  and  frequent 
integration  of  features  to  get  feedback 
about  the  team,  the  process  it  is  using,  and 
how  the  requirements  fit  the  actual  needs 
of  the  user  base. 

Attending  to  collaboration  and  feed¬ 
back  through  RTF  drives  the  selection  of 
many  of  the  tools  listed  in  this  article.  The 
agile  team  cares  that  the  following  occurs: 

•  The  right  roles  are  established  for  the 
team. 

•  The  people  who  show  up  fit  with  the 
rest  of  the  team. 

•  The  people  develop  particular  skills. 

•  The  environment  is  effective  to  the 
development  task. 

•  They  use  selected  process  elements. 

•  Collaboration  and  communication  are  facil¬ 
itated. 

•  The  mechanical '  hardware,  and  software 
tools  used  are  easy  to  use,  see,  and 
update;  are  effective;  and  support  the 
agile  approach. 

Teams  debate  items  in  all  of  these  cat¬ 
egories  and  will  feel  endangered  or 
strengthened  when  the  various  items  are 
removed  or  included.  It  is  on  this  basis 
that  I  consider  such  a  broad  range  of 
items  tools. 

There  is  not  space  here  to  undertake  a 
longer  description  of  agile  development;  it 
has  been  heavily  described  in  books  and 
articles.  Perhaps  the  best  introduction  to 
the  thinking  and  practices  involved  is 
found  in  “Agile  Software  Development 
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Ecosystems”  [7]  and  the  articles  “The 
New  Methodology”  [8],  “The  Business  of 
Innovation”  [9],  and  “The  People  Factor” 
[10].  The  Agile  Alliance  [11]  and  the  Agile 
Project  Management  Group  [12]  offer 
much  more  information. 

Listed  below  are  some  terms  that  may 
not  be  familiar  to  the  reader: 

•  Dynamic  System  Development 
Method  (DSDM):  A  founding  agile 
methodology  created  in  the  United 
Kingdom  in  the  late  1990s  [13]. 

•  Scrum:  A  founding  agile  methodology 
created  in  the  mid-1990s  [14]. 

•  Scrum  master:  In  the  Scrum  method¬ 
ology,  a  form  of  team  leader  who  spe¬ 
cializes  in  getting  people  to  talk 
together  and  in  removing  obstacles  to 
progress. 

•  Gold  cards:  A  token  allowing  a  devel¬ 
oper  to  work  on  something  other  than 
scheduled  features. 

•  Class-Responsibility-Collaborator 
(CRC)  cards:  An  object-oriented 
design  technique  in  which  designers 
write  class  names  on  index  cards  and 
role-play  the  design  with  the  cards 

[15]. 

•  Java  2  Enterprise  Edition  (J2EE):  A 

widely  used  component  library  mar¬ 
keted  by  Sun  Microsystems. 

•  Unified  Modeling  Language 
(UML):  A  widely  used  graphical 
design  documentation  notation. 

If  there  are  other  terms  you  find  unfa¬ 
miliar,  a  quick  Web  search  is  sure  to  turn 
up  descriptions  and  discussions  of  them. 

Tools  by  Purpose 

Included  here  are  entries  only  for  hiring, 
collaboration,  communication,  and  man¬ 
agement  purposes.  The  entries  for  other 
activities  should  be  fairly  obvious  when 
reading  the  list  grouped  by  form  later  in 
this  article. 

Hiring 

To  hire  the  appropriate  people  for  the 
team,  you  must  first  identify  the  roles 
needed  and  the  people  to  fit  those  roles. 
Process  and  social  tools  are  used  here. 

To  avoid  the  standard  mistakes  in  hir¬ 
ing,  the  tool  most  often  used  is  a  few 
hours  of  pair  programming  with  the  team 
members.  Teams  report  being  able  to  tell  a 
lot  more  about  how  an  applicant  thinks, 
designs,  communicates,  and  fits  with  the 
team  from  this  experience.  Even  without 
pair  programming,  interviewers  focus  on 
discovering  not  only  an  applicant’s  techni¬ 
cal  abilities,  but  also  their  personal  fit  with 
the  organization. 

Two  new  roles  or  skills  are  sought: 
facilitators  and  coaches.  In  the  late  1990s, 


the  founders  of  the  DSDM  felt  so  strong¬ 
ly  that  their  project  teams  needed  proper 
facilitation  expertise  that  they  helped 
develop  an  internationally  recognized 
facilitator  training  and  certification  pro¬ 
gram  [1 6] .  An  increasing  number  of  soft¬ 
ware  people  are  becoming  certified  public 
facilitators,  and  more  are  taking  basic  facil¬ 
itator  courses. 

Coach  (from  eXtreme  Programming 
[XP])  and  scrum  master  (from  Scrum)  are 
job  titles  designed  to  change  the  power 
relationship  and  interaction  dynamics 
from  the  traditional  team  lead  or  project 
manager.  The  coach  or  scrum  master  is  a 
lead  person  whose  job  typically  is  to  keep 
desired  practices  in  place  and  remove 
obstacles  for  the  group,  but  not  to  create 
schedules  for  the  developers  or  construct 
their  end-of-year  performance  reviews. 
Therefore,  the  group  perceives  them  as  a 
leading  colleague  rather  than  a  boss. 

Collaboration 

Although  physical  proximity,  whiteboards, 
poster  sheets,  index  cards,  and  sticky  notes 
are  still  the  dominant  tools  used  in  collab- 

“Whether  collocated  or 
distributed,  the  two 
prevalent  process  tools 
for  collaboration  include 
workshops  and  short 
daily  status  meetings.” 

oration,  people  started  finding  and  invent¬ 
ing  online  collaboration  tools  as  agile 
development  moved  into  distributed 
development.  These  tools  will  be  listed 
separately  in  the  computer-based  category. 
They  generally  include  WikiWiki  and 
thread-based  discussion  group  technolo¬ 
gies,  instant  messaging  technologies  with 
group  and  recording  variants,  and  distrib¬ 
uted  brainstorming  technologies. 

Whether  collocated  or  distributed,  the 
two  prevalent  process  tools  for  collabora¬ 
tion  include  workshops  and  short  daily 
status  meetings.  Workshops  are  used  to 
gather  requirements,  understand  usage 
patterns,  plan  the  project,  and  design  the 
software.  To  support  the  workshops,  spe¬ 
cific  office  facilities  are  required,  including 
group  work  areas  with  lots  of  wall  space, 
speakerphones,  and  videoconferencing. 

Communication 

Active  and  passive  communication 


remains  a  dominant  trait  of  agile  develop¬ 
ment,  whether  the  team  is  collocated  or 
distributed. 

Active  communication  involves  two  or 
more  people  working  on  the  same  task, 
whether  at  a  whiteboard,  sitting  side-by- 
side  looking  at  the  same  screen,  or  using 
shared  workspace  technology  to  look  at 
the  same  screen  from  different  sites. 

Passive  communication  involves  infor¬ 
mation  radiators.  These  are  most  often  on 
paper  or  whiteboard.  When  the  informa¬ 
tion  changes  on  a  minute-by-minute  basis, 
information  radiators  are  sometimes  dri¬ 
ven  online.  Information  radiators  include 
the  following: 

•  A  flat  monitor  hung  over  the  cubicle 
wall  [17]. 

•  A  real  traffic  light  hung  in  the  devel¬ 
opment  area  that  is  controlled  by  an 
automated  build  machine  [1 8] . 

•  An  ambient  orb  reporting  the  same  as 
the  traffic  light,  but  using  a  nationally 
broadcast  signal  so  teams  in  all  loca¬ 
tions  can  see  the  same  information 
[19,  20]. 

•  The  build  status  maintained  on  a  Web 
page  so  the  developers  can  see  what 
happened  to  the  code  they  just 
entered. 

Management 

Agile  teams  have  replaced  Gantt  charts 
with  earned  value  and  burn-down  charts 
[21],  graphs  of  tests  created  versus  passed, 
and  similar  charts.  To  report  these  to 
upper  management,  collocated  teams  still 
like  the  effects  of  posters  taped  to  the  wall 
or  spreadsheet  graphs.  A  fresh  set  of 
online  project  management  tools  is  enter¬ 
ing  the  market,  including  Rally, 
VersionOne,  and  XPlanner. 

Whether  online  or  on  paper,  these 
tools  report  status  with  respect  to  RTF,  not 
planning,  design,  or  documentation  tasks. 

Tools  by  Form 

Here  are  the  tools  clustered  by  their  form: 
environmental,  social,  physical,  process, 
mental,  and  computer-based. 

Environmental 

You  are  likely  to  find  that  the  agile  team 
will  either  request  a  different  office  layout 
or  will  simply  rearrange  their  given  space 
to  enhance  collaboration.  The  following 
are  common  desires: 

•  Common  design  and  programming 
areas. 

•  Lots  of  wall  space  for  posting  infor¬ 
mation  radiators. 

•  Convex  or  straight  desks  so  people  can 
cluster  around  the  monitor. 

•  A  common  couch  area  with  a  white- 
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board  (recording  type,  preferably). 

•  Kitchen,  for  social  discussions  during 
breaks. 

Social 

The  top  social  tools  are  collocating  teams 
and  attacking  problems  in  workshop  ses¬ 
sions.  Other  social  tools  revolve  around 
increasing  the  tolerance  or  amicability  of 
people  toward  each  other,  giving  them  a 
chance  to  alternate  high-pressure  work 
with  decompression  periods,  and  allowing 
them  to  feel  good  about  their  work  and 
their  contributions.  The  following  are 
desired  social  tools: 

•  Social  roles  such  as  coach,  facilitator, 
and  scrum  master. 

•  Collocated  teams  (for  fast  communica¬ 
tion  and  also  the  ability  to  learn  about 
each  other). 

•  Personal  interaction  (within  and  across 
specialties). 

•  Facilitated  workshop  sessions. 

•  Daily  stand-up  status  meetings. 

•  Retrospectives  and  reflection  activities. 

•  Assisted  learning  provided  by  lunch- 
and-learn  sessions,  pair  programming 
sessions,  and  having  a  coach  on  the 
project. 

•  Pair  programming  (to  provide  peer 
pressure). 

•  A  shared  kitchen. 

•  Toys  (to  allow  humor  and  reduce 
stress). 

•  Celebrations  of  success  and  acknowl¬ 
edgment  of  defeat. 

•  Gold  cards  issued  at  an  established 
rate  (to  allow  programmers  to  investi¬ 
gate  other  technical  topics  for  a  day  or 
two). 

•  Off-work  get  together s  (typically  a 
Friday  evening  visit  to  a  nearby  pub, 
wine-and-cheese  party,  even  volleyball, 
foosball,  or  Doom  competitions). 

•  Posting  information  radiators  in 
unusual  places  to  attract  attention  (the 
most  unique  I  have  seen  is  the  number 
of  open  defects  being  posted  in  the 
bathroom  [22].) 

Physical  Devices 

The  best  physical  devices  augment  indi¬ 
vidual  thinking,  group  thinking,  and  social 
interaction.  The  following  are  some  of  the 
preferred  ones: 

•  Index  cards  and  Post-it  notes  (in  any 
gathering  of  agile  developers,  some¬ 
one  is  likely  to  have  a  pack  of  index 
cards  with  them). 

•  Butcher  paper  lining  walls  and  halls. 

•  Whiteboards  (standard  or  moveable, 
printing,  recording,  or  with  a  camera). 

•  Poster  sheets  (plain  paper,  3M  sticky, 
or  plastic  cling  sheets). 


Process 

Preferred  process  tools  include  short, 
time-boxed  iterations,  frequent  integra¬ 
tion,  and  frequent  delivery.  Next  to  these 
are  workshops  for  various  purposes.  Some 
of  the  tools  are  both  process  and  social  in 
form,  so  I  risk  listing  them  twice.  They  are 
as  follows: 

•  Project  planning  jam  session  (XP’s 
planning  game  [23],  Crystal  Clear’s 
blitz  planning  [17],  or  Scrum’s  sprint 
planning). 

•  Requirements  workshop. 

•  Group  design  workshop. 

•  Reflection  or  retrospective  workshop. 

•  Pair  programming  session. 

•  Refactoring  code. 

•  Growing  the  system  (creating  a  very 
small  but  functional  implementation, 
adding  both  infrastructure  and  func¬ 
tionality). 

•  Time  boxing. 

•  Spike  prototyping  (throwaway  prototyp¬ 
ing  lasting  not  more  than  a  day  or  two). 

•  Early  integration. 

•  Frequent  delivery. 

•  Programmers  writing  unit  tests. 

•  Customer  writing  acceptance  tests. 

•  Tracking  by  earned  value,  burn-down, 
or  backlog. 

Thinking 

Agile  developers  may  or  may  not  model  the 
domain  with  UML,  but  they  do  have  tools 
for  helping  them  decide  what  and  how  to 
code,  starting  with  using  the  brain. 
Thinking  tools  include  the  following: 

•  Brain-engaged  common  sense  [24]. 

•  Test- fir  st  design  (assertion- driven 
design). 

•  CRC  cards. 

•  KISS  (keep  it  simple,  stupid). 

•  Once-and-only-once  code  (do  not 
repeat  yourself  in  your  code). 

Computer-Based 

Jeff  Patton  writes: 

Of  course,  agile  developers  have  a 
long  history  of  tool  building  —  I 
think  that  started  with  chimpanzees 
using  sticks  to  get  bugs  out  of 
stumps.  Today  we  use  xDoclet  to 
generate  J2EE  interfaces  and  class¬ 
es,  which  is  a  lot  like  getting  bugs 
out  of  stumps  [25]. 

There  are  enough  entries  in  this  list  that 
I  need  to  group  them  by  purpose. 
Obviously,  I  am  not  attempting  a  full  listing 
of  tool  vendors,  so  I  name  only  one  or  two 
sample  entries  of  available  online  tools 
where  that  is  relevant.  I  apologize  to  the 
other  tool  suppliers. 


Computer-Based  Tools  By 
Purpose 

Communication! Collaboration  Tools 

Here  are  communication  or  collaboration 
tools  that  require  software: 

•  Group  discussion  technologies  such  as 
WikiWiki,  Yahoo!  eGroups,  Lotus 
Notes,  Starteam,  NetMeeting,  WebEx, 
phpBB,  and  blogs. 

•  Instant  messaging,  including  group 
messaging,  messaging  with  drawing, 
and  messaging  with  discussion  thread 
management.  Examples  include 
Yahoo!  Messenger  with  Doodle 
Imvironment  (so  people  can  draw  at 
each  other  as  well  as  talk),  Jabber, 
AIM,  GAIM  (group  chat).  Engage 
Though tware  (thread  management), 
and  Trillian. 

•  Collaboration  software  packages  such 
as  Marratech,  Raindance,  Sparrow, 
Flywheel,  Thoughtware,  and  Borland’s 
Caliber. 

•  Video  projectors  for  group  coding, 
learning,  and  discussion  sessions. 

Documenting  Tools 

There  is  an  overlap  between  collaboration 
tools  and  documentation  tools. 
Increasingly,  teams  look  for  easy  ways  to 
put  the  results  of  a  group  workshop  into 
archive  format.  Often  that  involves  a  cam¬ 
era,  but  sometimes  it  means  using  an 
online  tool  during  collaboration,  including 
the  following: 

•  Recording  whiteboards;  scanners;  and 
archiving  message,  discussion,  and  col¬ 
laboration  tools  (the  output  is  simply 
put  or  linked  into  the  documentation). 

•  Generic  drawing  tools,  PowerPoint, 
Visio,  Dia,  and  ArgoUML  (free) 
replace  expensive  computer-aided 
software  engineering  packages. 

Project  Tracking  Tools 

These  are  online  alternatives  to  poster 
sheets  posted  on  the  wall,  and  are  particu¬ 
larly  useful  for  distributed  teams  and  for 
projects  whose  requirements  or  plans 
change  multiple  times  per  week.  They  are 
as  follows: 

•  Spreadsheets  (used  to  hold  project 
plan  and  status,  and  derive  tracking 
graphs). 

•  Software  for  tracking  the  project 
against  stories  and  tasks  comes  from 
XPlanner  (free).  Rally  Software’s  Agile 
Release  Management,  Borland’s 
CaliberRM,  and  VersionOne. 

Designing-Programming  Tools 

Here  are  the  essential  tools  requested  by 
agile  programming  teams: 
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•  Configuration  management/ version 
control  (Concurrent  Version  System 
or  your  favorite). 

•  Automated  unit  test  harness  such  as 
JUnit  or  any  of  the  xUnit  family 

•  Automated  acceptance  test  harness 
such  as  Fit  or  FitNesse. 

•  Automated  build  system,  preferably  a 
continuous  build  system  such  as  aug¬ 
mented  Another  Neat  Tool  or 
CruiseControl. 

•  Refactoring  development  environment 
(safe  refactoring  built  in)  such  as 
Intellij’s  IDEA,  Eclipse,  or  ReSharper. 

•  Performance  profiling  tool  such  as 
Jmeter,  Jprofiler,  or  Jprobe. 

•  Laptops  on  a  wireless  network  for  pro¬ 
gramming  anywhere. 

Other  Resources 

The  Internet  contains  many  discussions  of 
social,  process,  physical,  and  computer- 
based  tools  for  agile  development.  Ken 
Boucher  has  created  the  Web  site 
<www.fairlygoodpractices.com>  for  col¬ 
lecting  a  number  of  social  and  process 
tool  descriptions. 

To  discover  your  own  set,  simply  hold 
a  workshop  with  the  people  on  your  team 
and  ask  them  what  mental,  social,  envi¬ 
ronmental,  and  physical  devices  help  them 
in  their  work.  My  experience  is  that  they 
will  be  glad  to  share,  and  you  will  end  up 
with  an  impressive  list  of  your  own. 

Reflection  on  the  Lists 

I  was  surprised  at  the  breadth  of  tools 
requested  by  agile  teams,  by  how  far  back 
into  the  hiring  cycle  these  tools  extend, 
and  by  the  number  and  importance  of  the 
social  tools.  I  was  surprised  at  how  far  the 
industry  has  come  in  supporting  distrib¬ 
uted  teams  with  distance  collaboration 
and  automated  build  systems. 

As  I  wrote  in  [5],  understanding  passes 
from  person  to  person  more  rapidly  when 
they  are  standing  next  to  each  other,  as 
when  they  are  discussing  at  a  whiteboard. 
Agile  teams  stress  using  tools  that  permit 
the  rapid  flow  of  understanding.  Some  of 
those  tools  are  social,  starting  even  at  the 
hiring  stage.  Some  tools  are  technological, 
helping  distributed  teams  simulate  being 
physically  present.  Many  tools  are  physi¬ 
cal,  allowing  people  to  manipulate  them  in 
workshops. 

If  collaboration  is  one  leg  that  agile 
development  stands  on,  the  other  is  rapid 
feedback  from  running  code.  Configura¬ 
tion  management,  automated  testing,  refac¬ 
toring,  and  performance  profiling  tools  are 
the  dominant  entries  here.  As  Michael 
Vizdos  reminds  us,  do  not  forget  to  keep 
brain  and  common  sense  engaged  [24]. ♦ 
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A  Revolutionary  Use  of  COTS  in  a 
Submarine  Sonar  System 

Capt.  Gib  Kerr  Robert  W  Miller 

Program  Executive  Office  Submarines  Anteon  Corporation 

The  AN/ BQQ-10(V)  Acoustic  Rapid  Commercial  off-the-shelf  (COTS)  Insertion  (A-RCI)  submarine  sonar  system  has  been 
repeatedly  cited  as  one  of  the  Department  of  Defense’s  premier  examples  of  using  COTS  technology  to  provide  significantly  improved 
system  performance  at far  lower  costs  than  previously  possible.  The  ability  to  rapidly  and  inexpensively  upgrade  a  ship’s  sonar  hard¬ 
ware  suite  to  provide  continually  increasing  sonar  performance  has  helped  to  restore  United  States  submarine  superiority  over  all 
potential  adversaries.  As  part  of  this  revolution  in  RC1 \  the  program  has  identified  several  lessons  on  using  COTS  hardware  and 
software  that  can  help  other  programs  making  the  same  leap  into  the  COTS  world. 


By  the  mid-1990s,  the  United  States 
Navy’s  submarine  force  had  lost  its 
once  seemingly  insurmountable  lead  in 
detecting  and  tracking  foreign  submarines. 
The  use  of  improved  acoustic  quieting 
measures  on  foreign  submarines  as  well  as 
the  worldwide  proliferation  of  modern 
diesel-electric  submarines  had  sharply 
reduced  the  acoustic  advantage  that  the 
United  States  had  held  since  the  mid- 
1950s.  In  addition,  the  end  of  the  Cold 
War  brought  about  a  significant  reduction 
in  available  funding  to  develop  and  field 
the  improvements  necessary  to  restore 
superiority.  The  operating  forces  were 
forced  to  use  carry-on  commercial  sys¬ 
tems  in  an  effort  to  regain  some  of  the 
advantage  that  had  been  lost.  These  black 
boxes  did  provide  some  help  but  were  not 
fully  integrated  with  the  remainder  of  the 
ship’s  combat  system,  thereby  reducing 
their  effectiveness  in  maintaining  tactical 
control. 

In  an  effort  to  restore  United  States 
submarine  sonar  superiority  and  eliminate 
the  need  to  bring  on  temporary  equip¬ 
ment  to  meet  mission  requirements,  the 
Navy  began  developing  the  Acoustic 
Rapid  Commercial  off-the-shelf  (COTS) 
Insertion  (A-RCI)  sonar  system,  later  des¬ 
ignated  the  AN/BQQ-10(V).  Knowing 
that  the  $1.5  billion  development  cost  and 
the  $90  million  shipset  cost  for  a  new  mil¬ 
itary  specification  (MIL-SPEC)  system 
was  unaffordable,  the  A-RCI  sonar  system 
was  designed  from  day  one  to  use  COTS 
hardware  and  software  components  to 
provide  the  most  up-to-date  and  powerful 
computer  processing  capability  possible. 
This  allowed  the  use  of  advanced  signal 
processing  algorithms  to  exploit  the  much 
quieter  target  acoustic  signatures  now 
available. 

Using  these  advanced  algorithms,  the 
U.S.  Navy  submarine  force  has  now 
regained  the  tactical  advantage,  and  an 
ongoing  technology  insertion  program 
means  that  improvements  will  continue  to 


be  made.  In  addition,  using  COTS  com¬ 
ponents  instead  of  MIL-SPEC  hardware 
brought  the  development  cost  down  to 
about  $100  million  and  the  shipset  cost 
down  to  $10  million.  Since  the  A-RCI  sys¬ 
tem  was  designed  to  replace  the  different 
sonar  systems  on  the  various  submarine 
classes  with  a  common  system,  it  also 
reduced  the  support  infrastructure  and 
made  it  possible  for  all  submarines  to  have 
the  most  modern  and  capable  sonar  sys¬ 
tem  available.  Commonality  also  makes  it 
easier  to  improve  the  maintenance  and 
operational  skill  level,  and  increase  the 
operational  experience  of  the  sailors  serv¬ 
ing  in  the  fleet.  The  A-RCI  program’s 
experiences  in  using  COTS  for  a  critical 
military  system  can  be  of  great  benefit  for 
other  defense  programs  making  the  same 
leap  into  the  COTS  world. 

Initial  Implementation 

The  first  A-RCI  hardware  suite  consisted 
of  a  combination  of  custom  and  COTS 
Versa  Module  Europa  (VME)1  cards  to 
provide  the  necessary  processing  power  in 
the  limited  space  available  on  a  submarine. 
COTS  operating  systems  and  hardware 
drivers  were  used  to  the  maximum  extent 
practical  to  minimize  the  scope  of  the 
required  software  development  effort. 
However,  several  limitations  with  this 
architecture  were  soon  discovered. 

The  custom  cards  were  prone  to  fail¬ 
ure  and  were  difficult  to  program. 
Although  technically  a  COTS  product,  the 
signal  processing  cards  were  very  special¬ 
ized,  leading  to  high  procurement  costs 
and  the  use  of  an  operating  system  with 
limited  peripheral  driver  support.  The 
implementation  of  the  sonar  system  also 
used  the  COTS  hardware  and  software  in 
non-standard  ways  (i.e.,  fibre  channel  stan¬ 
dard  networks  for  interprocessor  commu¬ 
nications  vice  disk  access.  Asynchronous 
Transfer  Mode  local  area  networks)  mak¬ 
ing  it  more  difficult  to  get  vendor  support 
or  leverage  lessons  learned  from  commer¬ 


cial  implementations. 

Finally,  since  the  A-RCI  program  was 
only  a  small  player  in  the  COTS  market, 
receiving  timely  vendor  support  for  prob¬ 
lems  found  during  integration  and  test  was 
a  hit  or  miss  affair.  If  the  vendor  felt  we 
were  a  valuable  customer,  we  would  get 
good  support  for  correcting  noted  prob¬ 
lems;  but  more  likely,  the  vendor  focused 
its  efforts  in  fixing  problems  discovered 
by  its  more  mainstream  customers. 

The  most  important  lesson  learned 
from  this  implementation  was  that  as 
more  mainstream  hardware  and  software 
components  were  used,  fewer  problems 
were  discovered  during  testing,  and  the 
vendor  was  more  likely  to  fix  the  prob¬ 
lems.  This  revelation  became  one  of  the 
tenets  for  the  technology  insertion 
process  that  would  soon  be  implemented. 

The  Technology  Insertion 
Process 

One  of  the  key  enablers  for  both  the  tech¬ 
nology  insertion  process  and  using  COTS 
hardware  in  the  A-RCI  sonar  system  is 
using  Multipurpose  Transportable  Mid¬ 
dleware  (MTM)  to  isolate  the  application 
code  from  the  underlying  hardware  and  its 
associated  drivers  and  operating  systems. 
MTM  was  developed  and  is  still  main¬ 
tained  by  Digital  Systems  Resources,  now 
a  part  of  General  Dynamics  Advanced 
Information  Systems. 

MTM  is  a  freely  licensed  set  of  soft¬ 
ware  utilities  that  allows  for  high-speed 
data  passing  between  the  various  applica¬ 
tion  software  modules  running  in  the  A- 
RCI  sonar  system,  while  isolating  the 
modules  from  the  hardware  and  network 
protocols.  This  isolation  allows  the  hard¬ 
ware  and  associated  drivers  to  be  updated 
without  impacting  the  large  amounts  of 
complex  application  code.  Instead,  the 
impact  of  the  hardware  change  is  limited 
to  the  MTM  that  was  designed  to  easily 
handle  change. 

By  isolating  change  from  the  applica- 
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tion  code,  many  hours  (and  dollars)  are 
saved  with  each  hardware  technology 
insertion.  Because  of  MTM’s  benefits,  the 
A-RCI  sonar  system’s  hardware  has  been 
successfully  upgraded  five  times  in  the  last 
seven  years  to  reduce  system  cost  and 
complexity  and  improve  system-process¬ 
ing  performance. 

The  first  two  technology  insertions  to 
the  A-RCI  hardware  baseline  were  done  to 
eliminate  most  of  the  custom  VME  cards 
in  the  system  and  to  provide  improved 
display  performance.  Elimination  of  the 
custom  VME  cards  reduced  system  cost, 
improved  system  reliability,  and  made 
software  programming  easier  and  faster. 
Instead  of  having  to  code  at  an  assembly 
level  to  discrete  hardware  components, 
the  code  could  be  written  in  a  high-level 
language  (typically  C),  and  features  of  the 
COTS  operating  system  could  be  used  to 
the  maximum  extent.  Simplifying  the  cod¬ 
ing  allowed  the  programmers  to  spend 
more  time  writing  better  code  and  debug¬ 
ging  problems  instead  of  dealing  with  the 
details  of  the  hardware  interface. 

The  VME  signal  processors  with  its 
associated  proprietary  operating  systems 
and  interfaces  continued  to  be  used  to 
meet  the  processing  density  requirements. 
However,  the  decision  was  made  to 
migrate  the  display  system  from  VME  to  a 
commercial  workstation  technology  when 
it  became  apparent  that  there  would  be  lit¬ 
tle  vendor  support  for  high  performance 
graphics  on  VME  processor  boards. 

After  a  survey  of  available  high-end 
computer  workstations,  the  decision  was 
made  to  use  the  HP  J5000  workstation 
and  the  HP-UX  operating  system.  The 
choice  of  this  widely  used  COTS  operat¬ 
ing  system  opened  the  door  for  display 
development  using  standard  Motif  and 
Open  Graphics  Library  software  libraries. 
Using  standard  libraries  and  their  applica¬ 
tion  programming  interfaces  have  made 
possible  rapid  updates  to  the  displays  to 
fix  problems  and  implement  fleet-user 
recommendations.  This  rapid  response  to 
user  need  has  become  a  hallmark  of  the 
A-RCI  program. 

Starting  in  2000,  the  performance  lev¬ 
els  of  mainstream  COTS  processors 
became  high  enough  to  consider  using 
them  for  complex  signal  processing  appli¬ 
cations.  Since  then,  the  technology  inser¬ 
tion  process  has  focused  on  migrating  the 
remainder  of  the  sonar  system  to  main¬ 
stream  COTS  processors  with  a  main¬ 
stream  operating  system. 

Market  surveys  in  2000  indicated  that 
Intel  x86  family  processors  would  increase 
its  domination  of  the  server  market  and 
that  the  Linux  operating  system  would 


become  widely  supported  by  device  devel¬ 
opers.  Based  on  this  research,  the  signal 
processing  applications  were  shifted  from 
VME  cards  to  Compaq  eight- way  Pentium 
III  servers  running  the  Linux  operating 
system.  An  immediate  impact  of  this  deci¬ 
sion  was  a  large  decrease  in  system  acqui¬ 
sition  cost.  In  addition,  shifting  to  a  sym¬ 
metric  multiprocessor  (SMP)  architecture 
freed  the  programmer  from  having  to  dis¬ 
cretely  control  each  individual  processor 
and  allowed  focusing  on  making  the  appli¬ 
cation  code  as  robust  and  reliable  as  pos¬ 
sible.  Another  benefit  of  using  the  open- 
source  Linux  operating  system  was  its 
broad  user/developer  base  to  help  trou¬ 
bleshoot  problems.  Linux  and  the  soft¬ 
ware  written  to  use  it  are  also  more  famil¬ 
iar  to  most  software  programmers,  leading 
to  higher  productivity. 

The  2002  and  2004  technology  inser¬ 
tions  continued  the  migration  to  main¬ 
stream  COTS  hardware  and  software.  The 

“The  most  important 
lesson  learned  from  this 
implementation  was  that 
as  more  mainstream 
hardware  and  software 
components  were  used, 
fewer  problems  were 
discovered  during 
testing,  and  the  vendor 
was  more  likely  to  fix 
the  problems.” 

signal  processing  servers  were  changed 
from  the  eight- way  SMP  servers  to  less 
expensive  dual  processor  Intel  XEON- 
based  servers  running  at  higher  clock 
speeds.  In  addition,  the  display  servers 
were  changed  to  dual  processor  Intel 
XEON-based  servers  to  reduce  the  num¬ 
ber  of  different  hardware  types/operating 
systems  present  in  the  sonar  system.  Since 
both  the  display  and  signal  processing 
servers  now  used  a  common  hardware 
baseline,  software  development  was  easier 
because  data  transfer  was  now  simpler  (no 
more  byte  swapping),  and  a  common  set 
of  device  drivers  could  be  used  for  both 
server  types.  Just  as  important,  the  dual 
processor  architecture  maintained  the  pre¬ 


vious  generation’s  flexibility  of  not  having 
to  individually  program  each  processor. 

To  the  maximum  extent  possible,  the 
system  networks  were  also  migrated  to 
Gigabit  Ethernet  to  stay  within  best  com¬ 
mercial  practices  and  provide  the  most 
robust  set  of  hardware  and  device  drivers. 
However,  the  scope  of  change  in  the  2000 
and  2002  technology  insertions  resulted  in 
significant  changes  to  the  system  network 
and  cabinet  enclosures  from  the  previous 
generation.  Therefore,  as  part  of  the  tech¬ 
nology  insertions  in  2002  and  2004,  a  con¬ 
certed  effort  was  made  to  make  the  system 
network  architecture  more  flexible  and  to 
make  the  cabinet  enclosures  easier  to 
upgrade  during  future  technology  inser¬ 
tions.  This  effort  is  succeeding  as  the  cab¬ 
inet  enclosure  and  cabling  system  differ¬ 
ences  between  the  2002  and  2004  technol¬ 
ogy  insertions  are  minimal.  Now,  when  a 
new  processor  design  is  chosen  in  a  future 
technology  insertion,  no  multi-million 
dollar  cabinet  redesign  will  be  required. 

Eliminating  the  need  to  redo  a  large 
portion  of  the  shipboard  cabling  and 
change-out  cabinets  will  also  ensure  that 
future  technology  insertions  can  be  done 
in  a  standard  length  port  maintenance 
period  of  about  35  days  for  about  20  per¬ 
cent  of  the  cost  it  had  previously  taken. 
Reducing  change  external  to  the  cabinets 
is  imperative  to  minimizing  the  shipboard 
impact  of  technology  insertion. 

The  Benefits  of  COTS 

The  A-RCI  sonar  program  takes  advan¬ 
tage  of  the  many  benefits  of  using  COTS 
hardware  and  software  for  military  appli¬ 
cations.  A  significant  benefit  is  the  ability 
to  use  computer  systems  much  closer  to 
commercial  state-of-the-art  systems  than 
was  ever  possible  with  MIL-SPEC  sys¬ 
tems.  This  has  allowed  the  use  of 
advanced  computation-intensive  signal 
processing  algorithms  and  easy-to-use  dis¬ 
plays  to  improve  the  operator’s  ability  to 
detect  signals  of  interest.  Using  COTS 
processors  also  makes  it  much  easier  to 
develop,  purchase,  and  install  upgrades  to 
the  sonar  system  to  keep  its  performance 
at  the  highest  possible  level. 

In  addition,  using  standard  rack¬ 
mounted  server  boxes  means  that  ongoing 
improvements  in  commercial  computers 
can  now  be  rapidly  inserted  into  the  sys¬ 
tem  with  minimal  changes  required.  This 
is  similar  to  the  way  a  business  using 
Hewlett  Packard  and  Dell  computers 
would  upgrade  its  server  farm. 
Importantly,  the  ongoing  technology 
insertion  process  eliminates  the  need  to 
maintain  obsolete  COTS  hardware; 
instead,  when  a  ship’s  computer  hardware 
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becomes  obsolete  and  unsupportable,  it 
is  replaced  with  an  up-to-date  system. 

Using  COTS  hardware  components 
brings  the  benefit  of  using  COTS  oper¬ 
ating  systems,  device  drivers,  and 
libraries.  This  has  enabled  the  system 
software  developers  to  focus  on  the 
applications  versus  the  support  software. 
In  addition,  mainstream  software  is  bet¬ 
ter  tested  and  more  robust  than  custom 
software.  Using  open-source  software 
such  as  Linux  brings  the  advantage  of  a 
large  developer  base  so  that  software 
problems  will  be  resolved  in  a  very  time¬ 
ly  manner.  The  large  developer  base  also 
ensures  that  any  security  holes  are  quick¬ 
ly  discovered  and  corrected  and  that  no 
malicious  code  is  inserted  into  the  oper¬ 
ating  system.  This  is  one  reason  why 
Linux  has  a  much  lower  incidence  of 
security  breaches  than  the  proprietary 
Microsoft  Windows  operating  systems. 
Although  COTS  operating  systems, 
device  drivers,  and  libraries  are  used,  the 
critical  application  software  is  still  writ¬ 
ten  and  maintained  by  the  system  devel¬ 
opers  using  secure  facilities. 

The  lower  hardware  cost  and  the 
continuous  improvement  cycle  associat¬ 
ed  with  commercial  computer  hardware 
is  what  allows  the  A-RCI  technology 
insertion  process  to  succeed.  If  the  cost 
of  hardware  components  were  equiva¬ 
lent  to  the  MIL-SPEC  hardware  used  in 
the  past,  the  pace  of  system  upgrades 
would  be  unaffordable  and  the  Navy 
would  soon  be  behind  the  technology 
curve  like  it  was  in  the  mid-90s.  Using  a 
COTS  technology  insertion  process  has 
enabled  a  lOx  increase  in  system 
throughput  and  an  86  percent  reduction 
in  hardware  cost  per  billion  floating 
point  operations  per  second  in  a  six-year 
period.  Low  hardware  cost  has  also 
allowed  the  A-RCI  sonar  program  to 
purchase  system  equipment  from  several 
vendors,  ensuring  that  a  continuous  price 
competition  exists. 

Because  integrating  COTS  compo¬ 
nents  is  within  the  capability  of  firms 
much  smaller  than  the  traditional  major 
Department  of  Defense  (DoD)  contrac¬ 
tors,  a  much  broader  business  base  is  also 
available.  Configuration  control  of  the 
system  is  maintained  by  requiring  all  sys¬ 
tem  equipment  vendors  to  work  together 
in  specifying  the  COTS  components. 

The  Downside  of  COTS 

The  downside  to  using  COTS  software  is 
the  lack  of  insight  into  the  code  details. 
Since  the  system  contractor  does  not 
write  the  software,  the  programmers  have 
a  much-reduced  understanding  of  the 


code  than  they  would  have  with  internally 
developed  software.  This  could  make  the 
development  team  dependent  on  the 
skills  of  the  open-source  community  to 
fix  any  problems  noted  during  integration 
and  testing  -  an  unacceptable  situation. 
This  situation  is  prevented  by  researching 
the  COTS  products  selected  to  verify  they 
are  in  use  by  many  other  developers  with 
similar  applications  and  requirements. 
This  broad  user  base  helps  ensure  the 
software  is  well  tested  and  robust  before 
it  is  used  by  the  A-RCI  system. 

A  more  significant  downside  is  a 
result  of  using  COTS  hardware  in  a  non¬ 
office  environment.  COTS  servers  are 
designed  for  use  in  well  air-conditioned 
spaces  and  not  the  sealed,  water-cooled 
cabinets  used  on  submarines.  Cooling 
the  processors  has  become  a  significant 
issue,  currently  limiting  the  team’s  ability 
to  use  the  full  capability  of  today’s 
processors.  In  the  future,  it  may  not  be 
possible  to  continue  providing  increased 
processing  power  with  each  technology 
insertion  unless  improved  cabinet  cool¬ 
ing  methods  can  be  implemented. 

Process  Migration 

The  benefits  of  this  new  COTS  business 
model  have  so  significantly  outweighed 
the  disadvantages  (primarily  with  respect 
to  cost  and  rate  at  which  capability  can 
be  added)  that  the  model  has  been 
expanded  to  include  the  entire  non¬ 
propulsion  electronics  suite  on  the 
newest  class  of  submarine:  the  USS 
Virginia  attack  submarine.  The  process 
has  expanded  from  what  was  simply  a 
single  sonar  sensor  and  processor  to  a 
20-million  source  lines  of  code  system  of 
systems  that  includes  all  sensors,  ship’s 
navigation,  combat/ fire  control,  and 
ship  monitoring  functions. 

Rapid  COTS  insertion  is  also  being 
used  to  upgrade  older  submarine  classes’ 
combat  control  systems  and  is  planned 
for  use  on  undersea  weapons.  This  abili¬ 
ty  to  rapidly  insert  improved  capability  in 
the  form  of  software  and  hardware  has 
become  a  hallmark  of  acquisition 
reform.  Software  and  hardware  solutions 
that  are  one-time  developments  are  now 
implemented  in  many  systems,  including 
those  in  use  on  submarines,  surface 
ships,  undersea  surveillance  systems,  and 
aircraft. 

Conclusion 

The  AN/BQQ-10(V)  A-RCI  sonar  sys¬ 
tem  would  not  be  the  success  it  is  today 
without  its  embrace  of  COTS  hardware 
and  software.  The  only  way  to  economi¬ 
cally  take  advantage  of  the  advances  in 


computer  processing  is  to  buy  from  the 
mainstream  market.  The  less  the  hard¬ 
ware  has  to  be  modified  to  work  in  the 
system,  the  more  rapidly  and  inexpen¬ 
sively  it  can  be  implemented.  Moreover, 
COTS  hardware  brings  with  it  COTS 
software.  The  contractor  must  learn  to 
live  within  the  limitations  of  the  soft¬ 
ware  and  not  try  to  make  it  incremental¬ 
ly  better.  Time  spent  in  this  manner  is 
time  not  spent  improving  the  more  criti¬ 
cal  system  application  software.  By  pick¬ 
ing  COTS  software  that  is  well  used  and 
tested,  the  contractor  can  reduce  prob¬ 
lems  observed,  but  also  must  accept  the 
loss  of  total  control  over  the  code. 

If  COTS  hardware  is  used,  an  ongo¬ 
ing  technology  insertion  program  is 
required  to  reduce  obsolescence  issues 
and  maintain  the  system  at  its  highest 
capability.  A-RCI  has  successfully  imple¬ 
mented  five  technology  insertions  and 
has  an  ongoing  plan  to  continue  with  the 
process.  Making  the  technology  insertion 
process  affordable  is  the  MTM,  which 
helps  to  isolate  the  complex  application 
code  from  the  underlying  hardware  and 
device  drivers.  A-RCI  has  shown  that  it  is 
possible  to  reap  the  benefits  of  COTS 
computer  hardware  and  software  while 
still  meeting  all  military  requirements.  It 
is  now  up  to  other  DoD  programs  to 
make  the  same  leap. 

Final  Points 

Adopting  a  RCI  process  is  not  painless. 
Overcoming  organizational  bias,  MIL- 
SPEC  thinking,  severe  skepticism,  and  the 
not-invented-here  syndromes  were  tre¬ 
mendous  challenges  for  the  A-RCI  pro¬ 
gram  to  overcome  in  its  early  stages.  It 
required  an  extraordinary  culture  shift  for 
all  stakeholders  to  achieve  what  today  is 
almost  taken  for  granted.  The  combined 
sense  of  urgency  due  to  1)  the  need  to 
regain  technological  superiority,  and  2) 
severe  budget  cuts  drove  the  U.S.  Navy’s 
submarine  force  to  the  RCI  solution. 
Without  those  kinds  of  drivers,  no 
amount  of  hearing  this  is  a  good  idea  will 
result  in  other  DoD  programs  adopting 
RCI  processes.  RCI  is  a  business  decision 
that  requires  dedicated  believers  to  suc¬ 
ceed  and  change  the  status  quo  of  systems 
acquisition,  and  properly  leverage  the 
power  and  agility  of  the  commercial,  non¬ 
government  business  world. 

Knowing  that  RCI  may  be  the  only 
efficient  way  to  quickly  regain  technolog¬ 
ical  superiority  at  a  reduced  cost  does  not 
mean  that  those  who  manage  such  pro¬ 
grams  should  always  sleep  well  at  night. 
What  have  been  discussed  here  are 
implementations  in  modern  sensor  and 
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combat  systems.  These  are  systems  that 
warfighters  depend  on  when  they  are  in 
harm’s  way  to  be  100  percent  effective.  It 
is  this  balance  between  efficiency  in  cost 
and  effectiveness  in  war  that  should  keep 
program  managers  awake  at  night  with 
these  questions: 

•  How  can  we  be  100  percent  assured 
that  COTS  products  contain  no 
latent  defects  that  may  have  deadly 
consequences  when  they  manifest 
themselves? 

•  What  represents  the  necessary  and 
sufficient  testing  and  verification  to 
preclude  unacceptable  consequences? 

•  How  much  do  we  really  want  to  be 
dependent  on  a  potentially  fickle 
commercial  market  for  critical  sys¬ 
tems  in  our  military  machines? 

•  What  is  the  minimum  acceptable 


cost/risk  ratio  for  a  critical  technolo- 

gy? 

•  What  is  the  necessary  and  sufficient 
amount  of  discipline  required  in  the 
process  so  that  capability  is  rapidly 
inserted,  without  undo  risk,  and  with¬ 
out  unduly  constraining  the  process? 
It  is  the  description,  quantification, 
understanding,  and  reconciliation  of 
these  issues  and  their  risks  that  must 
become  the  main  focus  and  challenge  of 
the  program  manager’s  efforts  in  an  RCI 
program.  They  certainly  have  become 
the  focus  for  the  submarine  force’s  A- 
RCI  program.^ 

Note 

1.  VME  is  a  standard  developed  in  1981 
for  embedded  computer  hardware 
form  factor  and  data  transfer  protocol. 
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A  Survey  of  Anti-Tamper  Technologies 


Dr.  Mikhail  J.  Atallah,  Eric  D.  Bryant,  and  Dr.  Martin  R.  Stytz 

Arxan  Technologies,  Inc. 

This  article  surveys  the  various  anti-tamper  (AT)  technologies  used  to  protect  software.  The  primary  objective  of  AT  tech¬ 
niques  is  to  protect  critical  program  information  by  preventing  unauthorised  modification  and  use  of  software.  This  protec¬ 
tion  goal  applies  to  any  program  that  requires  protection  from  unauthorised  disclosure  or  inadvertent  transfer  of  leading- 
edge  technologies  and  sensitive  data  or  systems.  In  this  article,  we  review  the  various  approaches  to  AT  techniques,  their 
strengths  and  weaknesses,  their  advantages  and  disadvantages,  and  briefly  discuss  a  process  for  developing  program  protec¬ 
tion  plans.  We  also  survey  the  tools  that  are  typically  used  to  circumvent  AT  protections,  and  techniques  that  are  commonly 
used  to  make  these  protections  more  resilient  against  such  attack. 


The  unauthorized  modification  and 
subsequent  misuse  of  software  is 
often  referred  to  as  software  cracking. 
Usually,  cracking  requires  disabling  one  or 
more  software  features  that  enforce  poli¬ 
cies  (of  access,  usage,  dissemination,  etc.) 
related  to  the  software.  Because  there  is 
value  and/or  notoriety  to  be  gained  by 
accessing  valuable  software  capabilities, 
cracking  continues  to  be  common  and  is  a 
growing  problem. 

To  combat  cracking,  anti-tamper  (AT) 
technologies  have  been  developed  to  pro¬ 
tect  valuable  software.  Both  hardware  and 
software  AT  technologies  aim  to  make 
software  more  resistant  against  attack  and 
protect  critical  program  elements. 
However,  before  discussing  the  various 
AT  technologies,  we  need  to  know  the 
adversary’s  goals.  What  do  software  crack¬ 
ers  hope  to  achieve?  Their  purposes  vary, 
and  typically  include  one  or  more  of  the 
following: 

•  Gaining  unauthorized  access.  The 

attacker’s  goal  is  to  disable  the  soft¬ 
ware  access  control  mechanisms  built 
into  the  software.  After  doing  so,  the 
attacker  can  make  and  distribute  illegal 
copies  whose  copy  protection  or 
usage  control  mechanisms  have  been 
disabled  —  this  is  the  familiar  software 
piracy  problem.  If  the  cracked  soft¬ 
ware  provides  access  to  classified  data, 
then  the  attacker’s  real  goal  is  not  the 
software  itself,  but  the  data  that  is 
accessible  through  the  software.  The 
attacker  sometimes  aims  at  modifying 
or  unlocking  specific  functionality  in 
the  program,  e.g.,  a  demo  or  export  ver¬ 
sion  of  software  is  often  a  deliberate¬ 
ly  degraded  version  of  what  is  other¬ 
wise  fully  functional  software.  The 
attacker  then  seeks  to  make  it  fully 
functional  by  re-enabling  the  missing 
features. 

•  Reverse  engineering.  The  attacker 
aims  to  understand  enough  about  the 


software  to  steal  key  routines,  to  gain 
access  to  proprietary  intellectual  prop¬ 
erty,  or  to  carry  out  code-lifting ,  which 
consists  of  reusing  a  crucial  part  of 
the  code  (without  necessarily  under¬ 
standing  the  internals  of  how  it 
works)  in  some  other  software.  Good 
programming  practices,  while  they 
facilitate  software  engineering,  also 
tend  to  simultaneously  make  it  easier 
to  carry  out  reverse  engineering 
attacks.  These  attacks  are  potentially 
very  costly  to  the  original  software 
developer  as  they  allow  a  competitor 
(or  an  enemy)  to  nullify  the  develop¬ 
er’s  competitive  advantage  by  rapidly 
closing  a  technology  gap  through 
insights  gleaned  from  examining  the 
software. 

•  Violating  code  integrity.  This  famil¬ 
iar  attack  consists  of  either  injecting 
malicious  code  ( malware )  into  a  pro¬ 
gram,  injecting  code  that  is  not  malev¬ 
olent  but  illegally  enhances  a  pro¬ 
gram’s  functionality,  or  otherwise  sub¬ 
verting  a  program  so  it  performs  new 
and  unadvertised  functions  (functions 
that  the  owner  or  user  would  not 
approve  of).  While  AT  technology  is 
related  to  anti-virus  protection,  it  has 
some  crucial  differences.  AT  technol¬ 
ogy  is  similar  to  virus  protection  in 
that  it  impedes  malware  infection  of 
an  AT-protected  executable.  However, 
AT  technology  differs  from  virus  pro¬ 
tection  in  that  the  AT  technology’s 
goal  is  not  only  to  protect  the  client’s 
software  from  unauthorized  modifica¬ 
tion  by  malevolent  outsiders  (infection 
by  malware  written  by  others),  but  also 
to  protect  the  software  from  modifica¬ 
tion  by  an  authorized  client.  In  many 
situations,  it  is  important  that  only 
authorized  applications  execute  (e.g., 
in  a  taximeter,  odometer,  or  any  situa¬ 
tion  where  tampering  is  feared),  using 
only  authorized  functionality,  and  that 


only  valid  data  is  used. 

It  should  be  clear  by  now  that  AT 
technology  is  not  only  about  anti-piracy,  it 
has  an  equal  and  broader  aim  of  policy 
enforcement.  That  aim  is  to  enforce  the 
policies  of  the  software  publisher  about 
the  proper  use  of  the  software,  even  as 
the  software  is  running  in  a  potentially 
hostile  environment  where  the  user  owns 
the  processor  and  is  intent  on  violating  those 
policies. 

There  is  a  plethora  of  AT  protection 
mechanisms.  These  include  encryption 
wrappers,  code  obfuscation,  guarding, 
and  watermarking/  fingerprinting  in  addi¬ 
tion  to  various  hardware  techniques. 
While  these  techniques  are  discussed  sep¬ 
arately  for  pedagogical  purposes,  the 
reader  should  bear  in  mind  that  software 
is  best  protected  when  several  protection 
techniques  are  used  together  in  a  mutual¬ 
ly  supportive  manner.  No  technique  is 
invulnerable  or  even  clearly  superior  to 
the  others  in  all  circumstances;  therefore, 
a  mix  of  protection  techniques  allows  the 
defense  to  capitalize  on  the  strengths  of 
each  technique  while  also  masking  the 
shortfalls  of  other  techniques.  In  the  fol¬ 
lowing  paragraphs  we  present  a  brief 
overview  of  these  techniques. 

Hardware- Based  Protections 

The  most  common  hardware  approach 
uses  a  trusted  processor.  The  trusted, 
tamper-resistant  hardware  checks  and  ver¬ 
ifies  every  piece  of  hardware  and  software 
that  exists  —  or  that  requests  to  be  run  on 
a  computer  —  starting  at  the  boot-up 
process  [1].  This  hardware  could  guaran¬ 
tee  integrity  by  checking  every  entity 
when  the  machine  boots  up,  and  every 
entity  that  will  be  run  or  used  on  that 
machine  after  it  boots  up.  The  hardware 
could,  for  example,  store  all  of  the  keys 
necessary  to  verify  digital  signatures, 
decrypt  licenses,  decrypt  software  before 
running  it,  and  encrypt  messages  during 
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any  online  protocols  it  may  need  to  run 
(e.g.,  for  updates)  with  another  trusted 
remote  entity  (such  as  the  software  pub¬ 
lisher). 

Software  downloaded  onto  a  machine 
would  be  stored  in  encrypted  form  on  the 
hard  drive  and  would  be  decrypted  and 
executed  by  the  hardware,  which  would 
also  encrypt  and  decrypt  information  it 
sends  and  receives  from  its  random  access 
memory  The  same  software  or  media 
could  be  encrypted  in  a  different  way  for 
each  trusted  processor  that  would  execute 
it  because  each  processor  would  have  a 
distinctive  decryption  key  This  would  put 
quite  a  dent  in  the  piracy  problem,  as  dis¬ 
seminating  your  software  or  media  files  to 
others  would  not  do  them  much  good 
(because  their  own  hardware  would  have 
different  keys). 

A  less  drastic  protection  than  using  a 
separate,  trusted,  hardware  computational 
device  also  involves  hardware,  but  is  more 
lightweight  such  as  a  smart  card  or  physi¬ 
cally  secure  token.  These  lightweight  hard¬ 
ware  protection  techniques  usually  require 
that  the  hardware  be  present  for  the  soft¬ 
ware  to  run,  to  have  certain  functionality, 
to  access  a  media  file,  etc.  Defeating  this 
kind  of  protection  usually  requires  working 
around  the  need  for  the  hardware  rather 
than  duplicating  the  hardware.  The  diffi¬ 
culty  of  this  work-around  depends  on  the 
role  that  the  tamper-resistant  hardware 
plays  in  the  protection.  A  device  that  just 
outputs  a  serial  number  is  trivially  vulner¬ 
able  to  a  replay  attack  (e.g.,  an  attacker 
replays  a  valid  serial  number  to  the  soft¬ 
ware,  without  the  presence  of  the  hard¬ 
ware  device),  whereas  a  smart  card  that 
engages  in  a  challenge-response  protocol 
(different  data  each  time)  prevents  the 
simple  replay  attack  but  is  still  vulnerable 
(e.g.,  to  modification  of  the  software 
interacting  with  the  smart  card).  A  device 
that  decrypts  content  or  that  provides 
some  essential  feature  of  a  program  or 
media  file  is  even  harder  to  defeat. 

Advantages  and  Drawbacks 

The  chief  advantage  of  hardware-based 
protection  techniques  is  that  they  run  on 
a  trusted  CPU  and  can  be  made  arbitrari¬ 
ly  complex  —  hence,  difficult  to  defeat 
while  inflicting  minimal  computational 
cost  on  the  protected  software  once  it  has 
been  decrypted  within  the  hardware  and 
is  running.  However,  there  is  a  cost  to 
decrypt  it  in  the  first  place,  and  also  to 
encrypt  everything  that  goes  out  to  the 
non-protected  part  of  the  system,  and 
then  decrypt  it  when  it  comes  back  into 
the  trusted  hardware. 

In  addition,  it  is  generally  more  diffi¬ 


cult  to  successfully  attack  tamper-resistant 
hardware  and  make  the  exploit  directly 
available  to  others  than  a  software-only 
protection  scheme.  This  point  holds  only 
for  a  properly  designed  system.  A  com¬ 
promise  of  hardware  that  imprudently 
contains  the  same  secret  keys  as  all  other 
hardware  of  the  same  type  would  lead  to 
widely  reproducible  exploits. 

The  advantages  of  hardware  protec¬ 
tion  also  include  its  capability  to  enforce 
such  rules  as  “only  approved  peripherals 
can  be  a  part  of  this  computer  system,”  or 
“only  approved  (through  digital  signa¬ 
tures)  software  and  contents  are  allowed,” 
etc. 

Nevertheless,  hardware-based  protec¬ 
tion  also  has  its  drawbacks.  There  is  the 
usual  problem  of  inflexibility:  hardware- 
based  protections  are  more  awkward  to 
modify,  port,  and  update  than  software- 

“No  technique  is 
invulnerable  or  even 
clearly  superior  to  the 
others  in  all 

circumstances ;  therefore, 
a  mix  of  protection 
techniques  allows  the 
defense  to  capitalize  on 
the  strengths  of  each 
technique  while  also 
masking  the  shortfalls  of 
other  techniques.  ” 


based  ones.  They  are  also  less  secure  than 
commonly  assumed  and  can  be  broken; 
see,  e.g.,  [2].  To  date,  it  has  not  been 
demonstrated  that  hardware  protections 
can  scale  to  grid  computing  or  to  small- 
scale  computing.  In  addition,  there  is  no 
guarantee  that  all  avenues  of  attack  are 
closed  by  hardware  protection,  and  there 
is  a  significant  cost  attached  to  using 
hardware  protection;  the  cost  is  driven 
mainly  by  the  time  needed  to  assemble, 
integrate,  and  test  the  hardware  protec¬ 
tion  technique. 

Additional  drawbacks  to  the  hardware 
protection  approach  include  its  expense 
and  general  fragility  to  accidents  (an  elec¬ 
tric  power  surge  that  fries  the  processor 


also  renders  the  hard  drive  contents  unus¬ 
able  because  the  key  that  decrypts  them  is 
destroyed).  The  potential  implications  for 
censorship  are  also  chilling.  Another  dis¬ 
advantage  of  hardware  protection  is  the 
boot-up  time  and  the  time  spent  encrypt¬ 
ing  and  decrypting,  which  makes  the 
approach  problematic  for  low-end 
machines  and  embedded  systems  (unless 
the  whole  system  lies  within  tamper-resis¬ 
tant  hardware). 

Using  trusted  hardware  also  incurs 
many  indirect  costs  as  a  result  of  the  ear¬ 
lier-mentioned  limitations  it  imposes  (e.g., 
the  restriction  to  only  certain  approved 
hardware,  software,  and  media  creates  a 
barrier  to  competition  that  leads  to  high¬ 
er  prices).  Due  to  the  imperfect  protec¬ 
tion  offered  by  hardware  protection,  a 
more  robust  approach  to  software  securi¬ 
ty  interweaves  hardware  protection  with 
other  protection  techniques  such  as  those 
discussed  in  the  following  sections. 

The  rest  of  this  article  discusses  the 
various  software-based  protection  mecha¬ 
nisms.  The  reader  should  keep  in  mind 
that  hardware  and  software  protection 
techniques  are  not  mutually  exclusive.  A 
judicious  combination  can  serve  to 
increase  the  security  of  the  system  more 
than  any  of  its  individual  component 
techniques. 

Encryption  Wrappers 

With  encryption  wrapper  software  securi¬ 
ty,  critical  portions  of  the  software  (or 
possibly  all  of  it)  are  encrypted  and 
decrypted  dynamically  at  run-time.  The 
encryption  wrapper  approach  works  well 
against  a  static  attack,  and  forces  the 
attacker  to  run  the  program  in  order  to 
get  an  unencrypted  image  of  it.  To  make 
the  attacker’s  task  harder,  at  no  time  dur¬ 
ing  execution  is  the  whole  software  in  the 
clear ;  code  decrypts  just  before  it  executes, 
leaving  other  parts  of  the  program  still 
encrypted.  Therefore,  no  single  snapshot  of 
memory  can  expose  the  whole  decrypted 
program.  Of  course,  the  attacker  can  take 
many  such  snapshots,  compare  them,  and 
piece  together  the  unencrypted  program. 

Another  avenue  of  attack  is  to  figure 
out  the  various  decryption  keys  that  are 
present  in  the  software.  One  defensive 
technique  that  can  be  used  to  delay  the 
attacker  is  to  include  defensive  mecha¬ 
nisms  in  the  program  that  deprive  the 
attacker  of  using  run-time  attack  tools, 
e.g.,  anti- debugger,  anti-memory  dump, 
and  other  defensive  mechanisms,  which 
make  it  more  difficult  for  the  attacker  to 
run  and  analyze  the  program  in  a  synthet¬ 
ic  (virtual  machine)  environment.  Yet,  a 
determined  attacker  can  usually  defeat 
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these  protections  (e.g.,  through  the  use  of 
virtual  machines  that  faithfully  emulate  a 
PC,  including  the  most  rarely  used 
instructions,  cache  behavior,  etc). 

Encryption  wrappers  often  use  light¬ 
weight  encryption  to  minimize  the  compu¬ 
tational  cost  of  executing  the  protected 
program.  The  encryption  can  be  advanta¬ 
geously  combined  with  compression:  Not 
only  does  this  result  in  a  smaller  amount 
of  storage  usage,  but  it  also  makes  the 
encryption  harder  to  defeat  by  cryptanaly¬ 
sis  (of  course  one  compresses  before 
encryption,  not  the  other  way  around). 

An  encryption  wrapper’s  chief  advan¬ 
tage  is  that  it  effectively  hinders  an  attack¬ 
er’s  ability  to  statically  analyze  a  program. 
The  attacker  is  then  forced  to  perform 
more  sophisticated  types  of  dynamic 
attacks,  which  can  significantly  increase 
the  amount  of  time  needed  to  defeat  the 
protection.  The  main  disadvantage  of 
encryption  wrappers  is  the  performance 
penalty  caused  by  the  decryption  over¬ 
head,  and  its  weakness  to  memory  dumps: 
before  it  can  run,  encryption-protected 
software  must  be  decrypted,  at  which 
point  it  becomes  exposed. 

Code  Obfuscation 

Code  obfuscation  consists  of  transform¬ 
ing  code  so  it  becomes  less  intelligible  to 
a  human,  thus  making  it  not  only  harder 
to  reverse  engineer,  but  also  harder  to 
tamper  with.  In  software  that  has  specific 
areas  where  policy  checks  are  made,  these 
areas  will  be  harder  to  identify  and  disable 
after  the  software  has  been  obfuscated. 
Obfuscation  is  usually  carried  out  by 
inserting  or  performing  obfuscating 
transformations.  It  is  a  requirement  that 
these  transformations  do  not  damage  a 
program’s  functionality,  and  it  must  have 
only  a  moderate  impact  on  code  perfor¬ 
mance,  and  on  the  storage  space  used  on 
the  disk  and  at  run-time  (of  the  two, 
speed  is  more  important). 

The  obfuscation  must  also  be  resilient 
to  attack,  and  for  this  reason  it  is  desirable 
to  maximize  the  obscurity  of  the  obfuscat¬ 
ed  software.  The  obfuscating  transforma¬ 
tions  need  to  be  resilient  against  tools 
designed  to  automatically  undo  them,  and 
to  not  be  easily  detectable  by  statistical 
analysis  of  the  resulting  code  (resilience 
to  statistical  analysis  makes  it  harder  for 
automatic  tools  to  find  the  locations 
where  these  transformations  were 
applied). 

The  different  types  of  obfuscation 
transformations  that  have  been  proposed 
[3]  include  the  following: 

•  Layout  obfuscation.  This  modifies 

the  physical  appearance  of  the  code,  e.g, 


replacing  important  variables  with 
random  strings,  removing  all  format¬ 
ting  (making  nested  conditional  state¬ 
ments  harder  to  read),  etc.  Such  trans¬ 
formations  are  easy  to  make  but  are 
effective  only  when  combined  with 
other  transformation  techniques. 

•  Data  obfuscation.  This  obscures  the 
data  structures  used  within  a  program, 
e.g.,  the  representation  and  the  meth¬ 
ods  of  using  that  data,  independent 
data  merging  (and  vice-versa  —  split¬ 
ting  up  data  that  is  dependent),  etc. 
Data  obfuscation  serves  to  delay  the 
attacker  because  data  structures  con¬ 
tain  important  information  that  any 
attacker  needs  to  comprehend  before 
launching  an  attack. 

•  Control  obfuscation.  This  manipu¬ 
lates  the  control  flow  of  a  program  to 
make  it  difficult  to  discern  its  original 
structure,  e.g.,  through  merging  (or 
splitting)  various  fragments  of  code, 
reordering  expressions,  loops,  or 
blocks,  etc.  It  is  similar  to  creating  a 
spurious  program  that  is  entangled  with 
the  original  program  so  as  to  obscure 
the  important  control  features  of  that 
program. 

•  Preventive  transformations.  These 
aim  at  making  it  difficult  for  a  de¬ 
obfuscation  tool  to  extract  the  true 
program  from  the  obfuscated  version 
of  it.  Preventive  transformations  can 
be  implemented  by  using  what 
Collberg  [4]  calls  opaque  predicates, 
an  example  of  which  is  a  conditional 
statement  that  always  evaluates  as 
true,  but  in  a  manner  that  is  hard  to 
recognize. 

Obfuscation  can  be  done  at  the 
source-code  level  (source-to-source 
translation)  or  at  the  assembly  level. 
Although  most  obfuscators  are  of  the 
former  kind  (source-to-source),  assembly 
level  obfuscation  is  better  because  it 
effectively  hides  the  operation  of  the 
binary.  If  the  source-code  level  transfor¬ 
mations  hide  information  by  adding 
crude  and  inefficient  ways  of  doing  sim¬ 
ple  tasks,  then  the  code  optimizer  in  the 
compiler  may  undo  them.  If,  on  the  other 
hand,  the  transformations  are  clever 
enough  to  fool  the  optimizer,  then  it  can 
fail  to  properly  do  its  job,  and  the  perfor¬ 
mance  of  the  resulting  code  suffers.  Low- 
level  obfuscation  does  not  prevent  the 
code  optimizer  from  doing  its  job,  but  if 
done  carelessly  it  runs  the  risk  of  produc¬ 
ing  code  that  looks  so  different  from  the 
kind  produced  by  the  compiler  that  it 
inadvertently  flags  the  areas  where  the 
transformations  were  applied. 

Obfuscation  transformations  are  clas¬ 


sified  according  to  several  criteria:  how 
much  obscurity  they  add  to  the  program 
(potency),  how  difficult  they  are  to  break 
for  a  de-obfuscator  (resilience),  and  how 
much  computational  overhead  they  add 
to  the  obfuscated  application  (cost).  In 
[4],  software  complexity  metrics  are  used 
to  formalize  the  notion  of  transforma¬ 
tion  potency  and  resilience. 

The  potency  of  a  transformation 
measures  how  much  more  difficult  the 
obfuscated  code  is  to  understand  for  a 
human  than  the  original  code.  On  the 
other  hand,  the  resilience  of  a  transfor¬ 
mation  measures  how  well  it  stands  up  to 
attack  by  an  automatic  de-obfuscator. 
The  resilience  measurement  takes  two 
factors  into  account:  the  programmer 
effort  required  to  construct  the  de-obfus- 
cator  and  the  execution  time  and  space 
required  by  the  de-obfuscator  to  reduce 
the  potency  of  the  transformation.  The 
best  obfuscation  is  usually  achieved  by  a 
combination  of  the  above  three  men¬ 
tioned  transformations.  The  combination 
of  the  three  approaches  provides  a  well- 
balanced  mix  of  highly  potent  and 
resilient  transformations. 

Like  all  software-only  protections, 
obfuscation  can  delay  —  but  not  prevent  — 
a  determined  attacker  intent  on  reverse 
engineering  the  software.  Barak  [5]  pre¬ 
sents  a  family  of  functions  that  are  prov- 
ably  impossible  to  completely  and  suc¬ 
cessfully  obfuscate.  For  more  informa¬ 
tion  and  a  discussion  of  code  obfusca¬ 
tion,  refer  to  [3,  4,  6,  7]. 

Software  Watermarking  and 
Fingerprinting 

The  goal  of  watermarking  is  to  embed 
information  into  software  in  a  manner 
that  makes  it  hard  to  remove  by  an  adver¬ 
sary  without  damaging  the  software’s 
functionality.  The  information  inserted 
could  be  purchaser  information,  or  it 
could  be  an  integrity  check  to  detect 
modification,  the  placing  of  caption-type 
information,  etc.  A  watermark  need  not 
be  stealthy;  visible  watermarks  act  as  a 
deterrent  (against  piracy,  for  example), 
but  most  of  the  literature  has  focused  on 
stealthy  watermarks.  In  steganography 
(the  art  of  concealing  the  existence  of 
information  within  seemingly  innocuous 
carriers),  the  mark  is  required  to  be 
stealthy:  its  very  existence  must  not  be 
detectable  [8]. 

A  specific  type  of  watermarking  is 
fingerprinting,  which  embeds  a  unique 
message  in  each  instance  of  the  software 
for  traitor  tracing.  This  has  consequences 
for  the  adversary’s  ability  to  attack  the 


I  4  CrossTalk  The  Journal  of  Defense  Software  Engineering 


November  2004 


A  Survey  of  Anti-Tamper  Technologies 


watermark:  two  differently  marked  copies 
often  make  possible  a  diff  attack  that 
compares  the  two  differently  marked 
copies  and  can  enable  the  adversary  to 
create  a  usable  copy  that  has  neither  one 
of  the  two  marks.  Thus,  in  any  finger¬ 
printing  scheme,  it  is  critical  to  use  tech¬ 
niques  that  are  resilient  against  such  com¬ 
parison  attacks. 

A  watermark  is  generally  required  to 
be  robust  (hard  to  remove).  In  some  situ¬ 
ations,  however,  a  fragile  watermark  is 
desirable;  it  is  destroyed  if  even  a  small 
alteration  is  made  to  the  software  (e.g., 
this  is  useful  for  making  the  software 
tamper-evident) . 

Software  watermarks  can  be  static, 
i.e.,  readable  without  running  the  soft¬ 
ware,  or  could  appear  only  at  run-time 
(preferably  in  an  evanescent  form).  In 
either  case,  reading  the  watermark  usual¬ 
ly  requires  knowing  a  secret  key,  without 
which  the  watermark  remains  invisible. 

Watermarks  may  be  used  for  proof  of 
software  authorship  or  ownership,  finger¬ 
printing  for  identifying  the  source  of  ille¬ 
gal  information/ software  dissemination, 
proof  of  authenticity,  tamper-resistant 
copyright  protection,  and  captioning  to 
provide  information  about  the  software. 
When  software  watermarks  are  used  for 
proof  of  authorship  or  ownership  (cul¬ 
prit-tracing),  it  is  important  to  use  a  very 
resilient  scheme.  Recall  that  this  is  when 
the  watermark  contains  information 
about  the  copyright  owner  as  well  as  the 
entity  that  is  licensed  to  use  the  software, 
thus  allowing  trace-back  to  the  culprit  if 
the  item  were  to  be  illegally  disseminated 
to  others.  Breaking  the  security  of  such  a 
scheme  can  enable  the  attacker  to  frame 
an  innocent  victim. 

As  you  can  see,  while  watermarks  can 
demonstrate  authorized  possession  and 
the  fact  that  software  has  been  pirated, 
they  do  not  address  the  reverse  engineer¬ 
ing  or  authorized  execution  issues  of  the 
other  schemes  discussed;  therefore,  we 
advocate  the  development  and  use  of  a 
spectrum  of  software  protection  tech¬ 
niques. 

Guarding 

A  guard  is  code  that  is  injected  into  the 
software  for  the  sake  of  AT  protection.  A 
guard  must  not  interfere  with  the  pro¬ 
gram’s  basic  functionality  unless  that  pro¬ 
gram  is  tampered  with  -  it  is  the  tamper¬ 
ing  that  triggers  a  guard  to  take  action 
that  deviates  from  normal  program 
behavior.  Examples  of  guard  functionali¬ 
ty  range  from  tasks  as  simple  as  compar¬ 
ing  a  checksum  of  a  code  fragment  to  its 
expected  value,  to  repairing  code  (in  case 


it  was  maliciously  damaged),  to  complex 
and  indirect  forms  of  protection  through 
subtle  side  effects. 

The  preferred  use  of  the  guarding 
approach  consists  of  injecting  into  the 
code  to  be  protected  a  large  number  of 
guards  that  mutually  protect  each  other  as 
well  as  the  software  program  in  which 
they  now  reside.  Guards  can  also  be  used 
to  good  effect  in  conjunction  with  hard- 
ware-based  protection  techniques  to  fur¬ 
ther  ensure  that  the  protected  software  is 
only  executed  in  an  authorized  environ¬ 
ment. 

The  number,  types,  and  stealthiness  of 
guards;  the  protection  topology  {who  pro¬ 
tects  who)\  and  where  the  guards  are  inject¬ 
ed  in  the  original  code  and  how  they  are 
entangled  with  it  are  some  of  the  para¬ 
meters  in  the  strength  of  the  resulting 
protection:  They  are  all  tunable  in  a  man¬ 
ner  that  depends  on  the  type  of  code 
being  protected,  the  desired  level  of  pro¬ 
tection,  etc. 

Manually  installing  such  a  tangled  web 
of  protection  is  impractical  (as  it  must  be 
done  every  time  the  software  is  updated), 
so  it  is  important  that  this  protection  be 
done  in  a  highly  automated  fashion  using 
high-level  scripts  that  specify  the  protec¬ 
tion  guidelines  and  parameters.  It  should 
be  thought  of  as  a  part  of  the  compila¬ 
tion  process  where  an  anti-tamper  option 
results  in  code  that  is  guarded  and  tam¬ 
per-resistant. 

The  rationale  for  this  approach  is  that 
a  single  (even  if  elaborate)  AT  protection 
scheme  for  a  software  application  is 
insufficient  because  a  single  defense 
results  in  a  single  point  of  attack  that  can 
be  located  and  compromised.  To  make 
self-protection  robust,  the  defense  must 
not  rely  on  a  single  complex  protection 
technique  no  matter  how  effective  it 
might  be.  Instead,  there  needs  to  be  a 
multitude  of  (possibly  simple)  protection 
techniques  installed  in  the  program  that 
cooperatively  enforce  the  code’s  integrity 
as  well  as  protect  the  other  against  tam¬ 
pering. 

The  guard’s  response  when  it  detects 
tampering  is  flexible  and  can  range  from  a 
mild  response  to  the  disruption  of  nor¬ 
mal  program  execution  through  injection 
of  run-time  errors  (crashes  or  even  subtle 
errors  in  the  answers  computed);  the  reac¬ 
tion  chosen  depends  on  the  software  pub¬ 
lisher’s  business  model  and  the  expected 
adversary.  Generally,  it  is  better  for  a 
guard’s  reaction  to  be  delayed  rather  than 
to  occur  immediately  upon  detection  so 
that  tracing  the  reaction  back  to  its  true 
cause  is  as  difficult  as  possible  and  con¬ 
sumes  a  great  deal  of  the  attacker’s  time. 


More  on  guarding  can  be  found  in  [9] . 

AT  Process 

This  section  explores  the  various  AT 
guidelines  expressed  in  the  “Defense 
Acquisition  Guidebook”  [10],  and  the 
recommended  process  for  developing  a 
program  protection  plan.  The  “Defense 
Acquisition  Guidebook”  specifies  the  AT 
measures  that  should  be  considered  for 
use  on  any  system  with  critical  program 
information  (CPI),  developed  with  allied 
partners,  likely  to  be  sold  or  provided  to 
United  States  allies  and  friendly  foreign 
governments,  or  likely  to  fall  into  enemy 
hands.  The  first  step  in  the  recommended 
AT  methodology  is  to  develop  a  program 
protection  plan.  The  process  of  develop¬ 
ing  this  plan  includes  the  following: 

•  Develop  a  list  of  critical  technologies. 

•  Develop  a  threat  analysis. 

•  Develop  a  list  of  identified  vulnerabil¬ 
ities. 

•  Develop  a  preliminary  AT  require¬ 
ment. 

•  Perform  an  analysis  of  AT  methods 
that  applies  to  the  system,  including 
cost/benefit  assessments. 

•  Provide  an  explanation  of  which  AT 
method  (s)  will  be  implemented;  devel¬ 
op  a  plan  for  validating  the  AT  imple¬ 
mentation. 

The  standard  approach  of  validating 
AT  protections  is  done  via  red-teaming.  A 
red  team  consists  of  individuals  who  are 
well  versed  in  security  methods  and  their 
corresponding  weaknesses.  Their  primary 
objectives  are  to  attempt  to  defeat  the 
protection,  to  assess  the  protection’s 
strengths  and  weaknesses,  and  to  make 
recommendations  for  improvement. 
While  this  is  an  effective  method  of  eval¬ 
uation,  a  major  problem  with  red  teams  is 
that  the  validation  is  done  by  humans, 
and  may  not  be  totally  reliable  or  repeat- 
able.  Furthermore,  as  the  need  for  AT 
technologies  grows,  red  teams  are  becom¬ 
ing  increasingly  called  upon  to  perform 
evaluations.  The  teams  are  overwhelmed 
with  assignments,  significant  delays  in 
product  evaluations,  and  release  results. 
To  improve  this  process,  there  is  a  clear 
and  present  need  for  automated  testing 
and  validation  tools.  Such  tools  could  be 
used  to  define  a  standard  set  of  metrics 
and  guidelines  to  evaluate  software  pro¬ 
tections. 

Conclusion 

This  article  has  surveyed  the  motivation 
for  using  AT  technology,  the  hardware 
and  software  AT  techniques  in  use  today, 
and  the  strengths  and  weaknesses  of  AT 
technologies.  We  also  briefly  introduced 
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the  process  and  documentation  used  to 
develop  a  program  protection  plan.  The 
motivation  for  and  primary  objective  of 
AT  technology  is  to  protect  CPI  by  pre¬ 
venting  unauthorized  modification  and 
use  of  software.  The  main  software  AT 
techniques  include  encryption  wrappers, 
code  obfuscation,  watermarking/finger- 
printing,  and  guarding. 

A  fundamental  challenge  faced  by 
software  AT  technology  is  that  the  pro¬ 
tected  application  is  running  on  a  host 
that  is  not  trusted,  and  thus  cannot  be 
assured  to  be  secure.  Guards  address  this 
shortfall  to  a  degree  and  in  a  flexible  and 
extensible  manner.  However,  in  light  of 
the  need  for  robust,  seamless,  compre¬ 
hensive  software  defense,  using  both 
software  and  hardware  AT  solutions  in 
combination  often  offers  an  appealing 
alternative  to  using  them  individually 
(especially  if  economic  considerations 
are  factored  in). 

At  this  time,  indications  are  that  if 
strong  software  AT  technology  (e.g.,  in 
the  form  of  judiciously  constructed 
guards)  is  added  to  an  application  so  that 
it  requires  the  presence  of  a  lightweight 
tamper-resistant  hardware  device  in  order 
to  execute  properly,  the  result  is  a  strong 
yet  economical  software  protection  capa¬ 
bility.  ♦ 
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Safety  Analysis  as  a  Software  Tool 
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As  a  software  development  tool,  an  independent  software  safety  analysis  by  trained  analysts  reduces  losses  of  develop¬ 
ment  resources  and  schedule,  improves  product  quality,  and  prevents  costly  mishaps  that  occur  during  the  operational 
phase  of  the  system  life  cycle.  The  key  issues  of  an  effective  and  efficient  software  safety  analysis  include  (1)  financial 
and  managerial  independence  from  the  software  development  activity,  (2)  trained  and  qualified  personnel  to  perform 
the  analysis,  and  (3)  a  disciplined  process  that  focuses  the  analysis  effort,  by  priority,  on  the  more  safety  critical  areas. 


Performing  an  independent  system  and 
software  safety  analysis  on  embedded 
software  saves  overall  life-cycle  cost  and 
schedule  resources,  and  provides  a  better 
overall  product.  The  primary  objective  of 
safety  analysis  is  to  find  and  remove 
embedded  safety  related  hazards  in  the 
hardware  and  software  systems  before  a 
mishap  occurs. 

Finding  these  embedded  hazards  early 
in  the  development  cycle  reduces  cost, 
safeguards  schedules,  and  improves  prod¬ 
uct  quality  [see  Figure  1].  The  reduction  in 
added  costs  and  schedule  slips  due  to 
problems  found  late  in  the  development 
cycle  and  the  improvement  in  product 
quality  justify  the  cost  of  performing  an 
independent  software  safety  analysis. 
Additionally,  preventing  a  single  cata¬ 
strophic  mishap  by  removing  an  embed¬ 
ded  hazard  could  more  than  pay  for  the 
independent  safety  analysis  effort  many 
times  over,  depending  upon  the  system. 

This  article  identifies  key  terms  associ¬ 
ated  with  system  and  software  safety,  pro¬ 
vides  a  process  for  performing  software 
safety  analysis,  specifies  the  required  envi¬ 
ronment  for  efficiently  and  effectively  per¬ 
forming  safety  analysis,  provides  cost  and 
schedule  savings  rationale,  and  identifies 
issues  that  delay  or  prevent  effective  safe¬ 
ty  analyses.  Although  this  article  empha¬ 
sizes  performing  safety  analysis  on  soft¬ 
ware,  a  thorough  software  safety  analysis 
includes  a  system  safety  analysis  as  many 
of  the  embedded  hazards  occur  at  inter¬ 
faces  between  system  components. 

When  developing  software  systems,  a 
tool  enables  a  developer  to  build  better  sys¬ 
tems  quicker.  The  systems  are  more  effec¬ 
tive,  more  efficient,  and  safer.  Involving  an 
independent  software  safety  analysis  con¬ 
tributes  to  these  attributes  and  becomes  a 
tool  that  should  be  used  in  today’s  complex 
system  development  efforts. 

Software  Safety  Analysis 
Process 

An  effective  process  for  performing  a 
software  safety  analysis  includes  four  pri¬ 


mary  steps  (see  Figure  2): 

•  Step  1.  Identify  safety-critical  areas 
and  system  safety  hazards. 

•  Step  2.  Trace  implementation  of  safe¬ 
ty-critical  requirements  to  the  design 
and  its  corresponding  code. 

•  Step  3.  Verify  correct  system  use  and 
implementation  of  safety-critical  data 
and  processing. 

•  Step  4.  Track  identified  hazards 
throughout  the  system  life  cycle  . 

Each  step  is  discussed  in  the  following 
paragraphs. 

Step  I 

The  first  step  is  to  list  all  system  require¬ 
ments,  the  relative  level  of  safety  criticali¬ 
ty  for  each  system  requirement  with 
respect  to  the  other  requirements,  and  the 
applicable  documented  hazards  for  safety- 
critical  requirements.  Each  system/ sub¬ 
system  requirement  must  be  evaluated  for 
criticality  with  respect  to  potential  haz¬ 
ards.  The  safety  criticality  of  a  require¬ 
ment  depends  upon  the  identified  hazards 
and  other  items  such  as  remoteness,  con¬ 
tributory  impact,  redundancies,  and 
human  intervention. 

System  hazards  vary  depending  upon 
the  function  and  use  of  the  system.  These 
hazards  must  be  identified  and  document¬ 
ed.  Sub-hazards  that  contribute  to  higher- 
level  hazards  need  to  be  specified  to  a  suf¬ 
ficient  detail.  An  example  of  a  hazard 
might  be  erroneous  activation  of  release. 
Examples  of  corresponding  contributing 
sub-hazards  could  be  (1)  erroneous  release 
signal,  (2)  erroneous  status  display,  and  (3) 
malfunctioning  safety  lock.  This  list  of 
system  requirements  becomes  the  plan 
that  is  used  to  perform  software  safety 
analysis. 

Step  2 

Using  the  safety  criticality  priority  estab¬ 
lished  by  the  list  from  Step  1,  the  second 
step  is  to  evaluate  requirements.  This  step 
correlates  the  functional  requirements  to 
(1)  the  top  level  and  detailed  level  design 
descriptions  and  (2)  the  source  code 


implementation.  The  code  is  verified  for 
correct  implementation  of  the  require¬ 
ments  as  well  as  for  correct  syntax  and 
safe  coding  practices.  This  analysis  also 
includes  identifying  specific  safety-critical 
data  items  and  processing  within  the 
reviewed  code  module  for  use  in  the  next 
step.  An  example  of  a  safety-critical  data 
item  could  be  a  weapon  release  variable. 
An  example  of  safety-critical  processing 
could  be  the  processing  required  to  set  or 
reset  the  release  signal. 

Requirements  of  higher  safety  criti¬ 
cality  are  evaluated  before  those  of  lesser 
criticality.  The  intent  of  safety  analysis  is 
to  find  the  more  critical  hazards  that 
would  cause  mishaps  of  higher  severity. 
As  there  will  always  be  some  residual 
mishap  risk,  particularly  when  software  is 
involved,  the  task  of  performing  a  com¬ 
plete  and  thorough  software  safety  analy¬ 
sis  would  be  both  cost  prohibitive  as  well 
as  impossible.  Consequently,  items  of 
lesser  safety  criticality  may  not  be 

Figure  1 :  Software  Safety  Analysis  Benefits 
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Figure  2:  Software  Safety  Analysis  Process 
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reviewed  so  that  items  that  are  more  safe¬ 
ty  critical  can  be  more  deeply  and  com¬ 
pletely  reviewed. 

Step  3 

The  third  step  is  to  evaluate  the  safety- 
critical  data  and  processing  identified  in 
Step  2  in  the  context  of  the  system. 
Particular  attention  is  given  to  the  inter¬ 
faces  between  subsystems,  sequencing  of 
state  changes,  and  timing  windows  of 
vulnerability.  This  type  of  analysis  is 
oftentimes  not  given  sufficient  attention 
during  software  development  and  testing 
as  well  as  peer  reviews  because  of  the 
added  complexity  and  time  required  to  be 
thorough. 

Step  4 

In  the  fourth  step,  identified  hazards  are 
documented  and  communicated  to  the 
development  organization.  The  safety 
analysis  effort  tracks  the  identified  haz¬ 
ard  until  it  is  removed  from  the  software. 
As  software  hazards  tend  to  be  repeated 
in  other  areas  and  applications,  the  haz¬ 
ard  is  added  to  the  lessons  learned  software 
safety  analysis  database.  The  hazards 
from  this  database,  as  well  as  hazards 
identified  on  industry  generic  safety  lists, 
are  used  for  training  of  safety  analysis 
engineers  and  for  performing  evaluation 
checklists  when  future  modifications  are 
made  to  the  software  being  analyzed  or 
other  software  in  other  systems. 

An  example  of  a  generic  safety  list 
can  be  found  in  Appendix  E  of  the  Joint 
Software  System  Safety  Committee’s 
“Software  System  Safety  Handbook”  [1]. 
This  combined  list  of  software-specific 
hazards  is  very  large.  It  would  be  cost 


prohibitive  to  analyze  every  line  of  code 
against  every  item  in  the  hazard  list.  The 
implementation  freedom  that  software 
allows  precludes  an  all-comprehensive 
automated  tool  that  checks  every  line  of 
code  for  every  possible  hazard.  We  have 
found  that  a  trained  analyst  who  is  cur¬ 
rent  with  the  list  of  software  hazards  is 
more  efficient  and  effective  in  perform¬ 
ing  the  safety  analysis.  Software  utilities 
and  tools  can  and  are  often  used  to  help 
the  analyst  more  quickly  locate  similar 
patterns,  occurrences,  and  uses. 

Software  Safety  Analysis 
Environment 

To  perform  effective  and  efficient  soft¬ 
ware  safety  analysis,  an  environment  of 
three  components  is  required:  (1)  finan¬ 
cial  and  managerial  independence  from 
the  software  development  activity,  (2) 
trained  and  qualified  personnel  to  per¬ 
form  the  analysis,  and  (3)  a  disciplined 
process  that  focuses  the  analysis  effort, 
by  priority,  on  the  more  safety-critical 
areas  [see  Figure  3]. 

Each  of  these  components  is  neces¬ 
sary  for  a  successful  software  safety 
analysis.  The  absence  of  any  undermines 
the  effort  and  the  strength  of  the  other 
components.  For  example,  without  the 
financial  and  managerial  independence 
from  the  development  activity,  the  ana¬ 
lyst  may  be  directed  in  a  way  that  inhibits 
fully  performing  the  analysis,  or  the  dis¬ 
cipline  process  is  circumvented  because 
of  management  direction  caused  by  a 
need  to  use  resources  in  areas  other  than 
safety  analysis.  Similar  examples  can  be 
drawn  for  the  absence  of  the  other  com¬ 


ponents. 

Financial  and  managerial  indepen¬ 
dence  ensures  that  specific  resources  will 
be  used  for  performing  the  software 
safety  analysis  and  that  there  is  no  con¬ 
flict  of  interest  between  the  development 
activity  and  the  software  safety  analysis 
activity.  An  example  of  a  conflict  of 
interest  could  be  the  program  office  or 
development  organization  controlling 
the  work  of  the  safety  analysis  by  direc¬ 
tion  toward  or  away  from  specific  haz¬ 
ards  or  risks.  The  criticality  list  from  Step 
1  is  the  road  map  that  identifies  the  pri¬ 
ority  of  safety-critical  areas  to  be 
reviewed.  The  resources  are  used  in 
accordance  with  this  priority,  which 
means  that  safety-critical  areas  of  lower 
priority  may  not  be  reviewed  or  analyzed 
because  of  the  need  to  use  resources  to 
analyze  safety-critical  areas  of  higher  pri¬ 
ority. 

An  example  of  implementing  an 
independent  safety  analysis  effort  would 
be  for  the  program  office  to  contract 
separate  activities  to  the  development 
organization  and  the  software  safety 
analysis  organization.  As  such,  reports  to 
the  program  office  from  the  software 
safety  analysis  organization  are  indepen¬ 
dent  of  the  software  development  activi¬ 
ty.  Another  example  would  be  to  have 
the  safety  office  independently  contract 
to  the  software  safety  analysis  organiza¬ 
tion  for  work  being  developed  by  the 
program  office  that  is  contracted  to  the 
software  development  organization. 

Performing  a  successful  software 
safety  analysis  requires  a  technical  staff 
that  is  qualified  to  perform  safety  analy¬ 
ses  and  enjoys  doing  this  type  of  work. 
Most  engineers  prefer  building  new  sys¬ 
tems  and  being  part  of  the  development 
process  of  major  systems.  Many  engi¬ 
neers  find  no  interest  in  digging  through 
systems  to  find  embedded  hazards  that 
are  not  only  difficult  to  find  and  under¬ 
stand,  but  also  have  not  evidenced  them¬ 
selves.  It  becomes  the  proverbial  looking 
for  a  needle  in  the  haystack,  except  there 
is  really  no  clue  that  the  needle  is  even  in 
the  haystack  or  even  in  a  number  of 
haystacks.  Finding  engineers  who  can  do 
this  type  of  work  and  want  to  do  it  for 
many  years  is  difficult.  Some  can  do 
analysis  for  a  year  or  so;  however,  the 
strength  of  a  software  safety  analysis 
organization  comes  from  analysts  who 
have  done  analysis  for  many  years  on 
many  systems. 

Two  pitfalls  are  seen  when  companies 
set  up  software  safety  organizations. 
First,  individuals  who  are  used  to  per¬ 
forming  software  safety  analysis  activities 


Figure  3:  Safety  Analysis  Environment  for  Success 
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Safety  Terminology 

For  the  purposes  of  this  article,  the  following  safety-related  terms  are  provided  from 

[2]- 

•  A  mishap  is  an  unplanned  event  that  results  in  death,  injury,  occupational  illness, 
equipment  or  property  damage  or  the  loss  of,  or  environmental  damage. 

•  Hazards  are  conditions  that  cause  a  mishap. 

•  The  ultimate  goal  of  a  system  safety  program  is  to  design  systems  that  contain 
no  hazards.  However,  since  the  nature  of  most  complex  systems  makes  it  impos¬ 
sible  or  impractical  to  design  them  completely  hazard-free,  a  successful  system 
safety  program  often  provides  a  system  design  where  there  exist  no  hazards 
resulting  in  an  unacceptable  level  of  mishap  risk. 

•  Mishap  risk  is  an  expression  of  the  possibility/impact  of  a  mishap  in  terms  of  haz¬ 
ard  severity  and  hazard  probability. 

•  Residual  mishap  risk  is  the  remaining  mishap  risk  after  all  mitigation  techniques 
(techniques  used  to  remove  or  lessen  the  hazard)  have  been  implemented  or 
exhausted. 

•  Safety  is  the  freedom  from  hazards,  which  cause  death,  injury,  occupational  ill¬ 
ness,  equipment  or  property  damage  or  loss,  or  environmental  damage. 

•  The  objective  of  a  safety  analysis  is  to  achieve  acceptable  mishap  risk  through 
a  documented  systematic  approach  to  hazard  analysis,  risks  assessment,  and  risk 
management. 

•  System  safety  is  the  application  of  engineering  and  management  principles,  cri¬ 
teria,  and  techniques  to  achieve  acceptable  mishap  risk,  within  the  constraints  of 
operational  effectiveness  and  suitability,  time,  and  cost,  throughout  all  phases  of 
the  system  life  cycle. 


are  not  correctly  screened  with  respect  to 
ability  and  desire.  Oftentimes,  they  are 
selected  from  those  who  have  not  been 
successful  doing  other  code  development 
activities  because  of  work  habit  issues  or 
lack  of  ability  The  fallacy  of  doing  this  is 
that  performing  successful  analysis  on 
code  that  is  generated  by  the  develop¬ 
ment  activity  requires  individuals  that  are 
better  trained  and  more  capable  than 
those  who  have  developed  the  code. 
They  must  be  able  to  find  embedded 
hazards  missed  by  other  development 
reviews  and  testing  that  could  result  in  a 
mishap  during  the  operational  phase  of 
the  system  life  cycle. 

The  second  pitfall  of  setting  up  soft¬ 
ware  safety  organizations  is  that  the 
development  activity  expects  that  the 
code  developer  should  be  able  to  gener¬ 
ate  good  code  that  contains  no  safety 
hazards.  Consequently,  the  development 
activity  either  sees  no  value  in  perform¬ 
ing  an  independent  safety  analysis  or  lim¬ 
its  safety  analysis  resources  to  the  point 
that  little  can  be  done  to  effectively 
accomplish  a  thorough  safety  analysis. 
They  fail  to  understand  that  there  is  a 
basic  difference  between  engineers  who 
develop  code  and  engineers  who  analyze 
code  for  embedded  hazards.  Engineers 
who  develop  code  are  success  oriented. 
They  move  from  the  implementation  of 
one  requirement  to  the  next.  They  are 
driven  by  a  typically  over-budget  sched¬ 
ule  and  are  always  anxious  to  catch  up. 
Conversely,  safety  engineers  analyze  code 
in  the  context  of  finding  failures.  They 
move  from  analyzing  one  module  to  the 
next  only  when  they  are  convinced  that 
there  are  no  embedded  safety  concerns. 

Complexity  Warrants  Additional 
Safety  Analysis 

With  our  mentality  of  getting  the  most 
out  of  software  development  budgets 
combined  with  the  mindset  that  develop¬ 
ers  can  generate  hazard-free  code,  man¬ 
agers  pressure  software  developers  to 
generate  code  faster  and  more  efficiently. 
They  insist  that  better  processes,  pride  of 
workmanship,  better  compilers  and 
development  environment  tools,  code 
walk-throughs,  and  peer  reviews  should 
sufficiently  guarantee  safe  code.  It  is  true 
that  compilers  and  development  environ¬ 
ment  tools  are  becoming  more  powerful, 
but  at  the  same  time  they  are  becoming 
more  complex.  This  additional  complex¬ 
ity  warrants  additional  safety  analysis.  It 
is  true  that  code  walk-throughs,  peer 
reviews,  and  other  process  improve¬ 
ments  result  in  better  code;  however, 
they  rely  upon  peers  who  are  also  behind 


schedule,  over-budget,  and  anxious  to  get 
their  own  work  done.  These  distractions 
lessen  the  effectiveness  of  the  peer 
review.  Additionally,  peer  reviews  tend  to 
focus  on  the  module  level  and  place  less 
emphasis  at  the  system  level  where  many 
of  the  embedded  safety  hazards  reside. 

Software  developers  must  be  safety 
conscious  as  they  develop  code. 
However,  in  light  of  the  above  and  their 
success  orientation,  developers  continue 
to  introduce  embedded  hazards  in  the 
software  development  process;  it  is  very 
difficult  for  them  to  see  their  own  errors. 
E-mails  are  a  vivid  example.  E-mail 
authors  re-read  their  own  e-mails  over 
and  over  to  verify  correctness.  They  send 
them  out  only  to  later  find  a  glaring  error 
in  the  most  awkward  place  that  they 
missed  during  multiple  reviews.  Some 
well-known  examples  of  software  fail¬ 
ures  resulting  in  mishaps  are  described  in 
Appendix  F  of  the  Joint  Software  System 
Safety  Committee’s  “Software  System 
Safety  Handbook”  [1]. 

Engineers  who  effectively  analyze 
code  for  embedded  hazards  are  con¬ 
vinced  that  all  software  contains  embed¬ 
ded  hazards  and  that  it  is  only  a  matter  of 
time  and  circumstance  before  the  haz¬ 
ard^)  causes  a  mishap.  The  quality  and 
quantity  of  analysis  is  a  function  of  the 
analyst’s  safety  experience  and  under¬ 
standing  of  the  code  under  inspection 
within  the  context  of  the  system. 
Tangible  products  of  the  analysis  may  be 


misleading  as  amount  and  quality  of 
product  does  not  necessarily  prove  that 
the  right  analysis  was  performed.  On  the 
other  hand,  the  tangible  products  of  a 
development  effort  do  prove  the  efforts 
of  the  development  engineer. 

From  the  development  activity  per¬ 
spective,  if  the  code  successfully  per¬ 
forms  its  intended  function  and  matches 
documented  code  standards,  then  return 
on  investment  is  evident.  Failure  to  iden¬ 
tify  embedded  hazards  does  not  confirm 
that  quality  analysis  has  not  been  per¬ 
formed  any  more  than  the  identification 
of  some  embedded  hazards  ensures  that 
all  hazards  have  been  found.  The  analyst 
decides  the  correct  amount  of  effort 
spent  in  the  analysis  of  a  safety-critical 
area  for  hazards  without  evidence  of 
their  existence  based  on  his  or  her  expe¬ 
rience  and  understanding. 

In  summary,  development  engineers 
are  good  at  building  new  systems  in  the 
context  of  a  driving  schedule.  Software 
safety  engineers  are  good  at  evaluating 
code  for  embedded  hazards.  Requiring 
development  engineers  to  constantly 
evaluate  their  code  with  the  understand¬ 
ing  that  something  is  wrong  and  there  are 
embedded  hazards  seriously  takes  away 
from  the  success  orientation  that  enables 
forward  progress.  Software  safety  analy¬ 
sis  engineers  are  attuned  to  the  identifi¬ 
cation  of  embedded  hazards  and  the 
amount  of  resources  required  to  fully 
analyze  a  safety-critical  area  of  code.  Just 


November  2004 


www.stsc.hill.af.mil  1 9 


Software  Toolbox 


as  letting  off  an  automobile’s  gas  pedal 
does  perform  some  slowing,  and  letting 
off  the  brake  pedal  allows  continued 
movement,  both  the  gas  pedal  and  the 
brake  pedal  are  required  for  efficient 
handling.  A  combination  of  develop¬ 
ment  engineers  and  software  safety  engi¬ 
neers  in  an  independent  environment 
provides  a  product  that  is  synergistically 
more  than  if  either  were  to  do  both 
tasks. 

The  software  safety  analysis  process 
combines  the  people  and  the  resources 
to  produce  the  most  effective  and  effi¬ 
cient  product  possible.  The  process 
ensures  that  priorities  are  followed, 
products  are  produced,  and  schedules 
are  met.  The  four  primary  steps  of  a 
software  safety  analysis  process  have 
been  described.  Necessary  products  of 
the  safety  analysis  include  a  criticality 
analysis  report  from  Step  1 ,  problem 
reports  from  all  steps,  and  a  software 
safety  analysis  report  —  including  testing 
and  analysis  summaries  —  from  Step  2 
through  Step  4  of  the  process.  When  a 
thorough  and  conscientious  software 
safety  analysis  is  complete,  and  safety 
hazards  have  been  identified  and 
removed,  the  resulting  summary  report 
becomes  a  tangible  product  that  indi¬ 
cates  with  a  high  level  of  confidence  that 


the  examined  software  will  not  be  the 
source  of  a  system  mishap. 

Issues  That  Hinder  Software 
Safety  Efforts 

There  are  many  reasons  why  organiza¬ 
tions  mistakenly  choose  not  to  include  a 
software  safety  analysis  activity  early  in 
the  code  development  cycle  (see  Table  1). 
These  include  the  following: 

1.  Organizations  erroneously  believe 
that  performing  software  safety 
analysis  only  needs  to  be  done  when 
code  has  been  generated.  They 
believe  that  they  can  conserve 
resources  during  the  requirements 
definition  and  design  disclosure  phas¬ 
es  by  waiting  until  code  is  released  to 
involve  the  software  safety  analysis 
effort.  They  fail  to  understand  the 
importance  of  evaluating  system  and 
functional  requirements  with  respect 
to  safety  prior  to  design,  and  of  eval¬ 
uating  the  design  disclosure  with 
respect  to  safety  prior  to  coding. 
Safety  concerns  found  during  the 
implementation  phase  after  the  code 
has  been  generated  require  re-evalua- 
tion  of  the  requirements,  redesign, 
and  recoding.  This  results  in  wasted 
resources  and  schedule  slips  because 


of  the  necessary  review  and  rework. 
This  is  further  impacted  by  the  soft¬ 
ware  safety  analyst’s  need  for  time  to 
become  familiar  with  the  function, 
requirements,  design,  and  code  of  the 
software  under  analysis.  If  this  need 
is  put  off  until  code  is  released,  then 
safety  concerns  are,  consequently, 
identified  later  in  the  implementation 
phase,  resulting  in  additional  wasted 
resources  because  testing  must  also 
be  repeated  due  to  reworked  require¬ 
ments,  design,  and  code. 

2.  Government  organizations  find  it  dif¬ 
ficult  and  time  consuming  to  estab¬ 
lish  a  contract  with  an  independent 
organization  to  do  software  safety 
analysis.  It  is  important  to  start  the 
process  early  to  take  into  account  the 
lead  times  as  well  as  the  need  for 
either  contracting  directly  with  the 
software  safety  analysis  company  or 
using  a  contract  vehicle  already  in 
place  by  the  contractor. 

3.  The  organization  erroneously  be¬ 
lieves  that  a  good  code  development 
process  will  preclude  all  embedded 
safety  hazards.  Mishaps  caused  by 
software  occur  in  fielded  systems  that 
were  developed  under  good  process¬ 
es.  As  described  earlier,  an  indepen¬ 
dent  software  safety  analysis  can  find 
embedded  hazards  and  prevent 
mishaps  when  trained  and  experi¬ 
enced  analysts  are  used  and  the  soft¬ 
ware  safety  analysis  process  is  fol¬ 
lowed. 

4.  Organizations  fail  to  factor  into  their 
budget  the  software  safety  analysis 
activity  when  cost  projections  are 
supplied  to  planning  activities.  Upon 
program  execution,  they  severely  limit 
or  do  not  fund  software  safety  activi¬ 
ties  because  of  the  difficulty  of  find¬ 
ing  unbudgeted  resources  to  cover 
safety.  Including  software  safety 
analysis  activities  in  the  master  budget 
plan  is  critical  to  software  safety. 

5.  The  lack  of  mishap  evidence  gives 
the  program  manager  a  false  impres¬ 
sion  of  the  safety  state  of  the  soft¬ 
ware  being  developed.  If  an  embed¬ 
ded  hazard  is  found  and  removed, 
there  is  no  evidence  that  the  mishap 
would  have  ever  occurred.  Embedded 
hazards  cause  catastrophic  mishaps 
only  when  a  set  of  combining  cir¬ 
cumstances  simultaneously  occurs.  A 
thorough  analysis  covers  areas  and 
combinations  of  events  that  are 
either  difficult  to  test  or  are  not  test¬ 
ed  because  of  limitations  due  to  test 
time  and  tester  expertise. 

6.  Organizations  have  difficulty  finding 


Table  1 :  Mistaken  Reasons  Why  Software  Safety  Is  Not  Included  During  Early  Phases  of  the 
System  Development  Eife  Cycle 


Mistaken  Reasons  Why  Software  Safety  Is  Not  Included  During 
Early  Phases  of  System  Software  Development  Life  Cycle 

Reason 

Actual  Need 

Conserve  funds  because  there  is  no  code 
to  review. 

Early  evaluation  of  requirements  and  design 
precludes  costly  coding  and  testing  errors. 

Difficulty  establishing  a  contract  with  an 
independent  organization  to  do  safety. 

Most  major  defense  contractors  have  General 
Services  Administration-type  contracts  that  could 
support  safety  efforts. 

Misconception  that  good  coding  processes 
preclude  embedded  safety  hazards. 

Success  orientation  of  development  engineers 
results  in  missed  errors;  development 
environment  pressures  prevent  thorough 
system  and  interface  analysis. 

Failure  to  budget  in  software  safety 
analysis  activities. 

Software  safety  analysis  needs  to  be  budgeted 
independently  of  development  activity  budgets. 

Lack  of  mishap  evidence  if  hazard  found 
and  removed. 

The  objective  of  software  safety  is  to  remove 
hazard  before  mishap;  prevention  of  one 
catastrophic  mishap  more  than  pays  for  safety 
analysis  effort. 

Lack  of  software  safety  analysis  expertise 
and  processes. 

Evaluate  potential  safety  analysis 
organizations  on  track  record,  processes, 
and  staff. 

Problems  found  in  safety  analysis  cause 
additional  work  impacts. 

Early  identification  of  safety  problems  saves 
resources  by  preventing  redesign,  recode, 
retest,  and  prevents  mishap. 
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software  safety  analysis  expertise  and 
processes.  As  described  earlier,  effec¬ 
tive  analysis  is  a  function  of  the 
expertise  and  experience  of  the  ana¬ 
lyst.  Qualified  sources  for  software 
safety  expertise  will  probably  be  more 
costly  because  of  the  need  to  employ 
this  level  of  expertise  and  experience. 
7.  Organizations  are  concerned  that 
problems  found  by  software  safety 
analysis  will  cause  additional  work 
that  impacts  schedule  and  resource 
needs.  Reputable  organizations  do 
not  generate  unsafe  software. 
However,  because  of  the  nature  of 
embedded  hazards  that  result  in 
mishaps,  there  is  always  the  concern 
that  large  amounts  of  resources  are 
spent  to  prevent  mishaps  that  have  a 
very  low  probability  of  occurring. 
These  organizations  fail  to  under¬ 
stand  that  providing  a  small  level  of 
software  safety  analysis  can  greatly 
lower  the  probability  of  a  mishap 
occurring. 

Each  of  these  mistaken  reasons  is 
real.  Together  they  may  discourage  using 
software  safety  analysis  as  a  tool  to  gen¬ 
erate  a  better  product  for  less  cost. 
Finding  these  embedded  hazards  early  in 
the  development  cycle  reduces  cost,  safe¬ 
guards  schedules,  and  improves  product 
quality.  Our  experience  shows  that  a 
requirements  problem  that  is  not  found 
until  the  test  phase  of  the  software 
development  cycle  results  in  the  loss  of 
70  percent  of  the  time  used  to  design, 
code,  and  test  the  implementation  of 
that  requirement. 

Summary 

We  live  in  a  world  that  is  averse  to  unsafe 
conditions.  We  also  live  in  a  world  that 
applies  heavy  pressure  to  building  the  bet¬ 
ter  and  faster  more  efficiently.  The  con¬ 
flicts  between  these  two  mindsets  are 
profit  and  risk.  The  courts  of  the  land 
insist  daily  upon  the  responsibility  of  the 
product  provider.  Flashy  packaging  and 
brand-name  recognition  oftentimes  erro¬ 
neously  instill  within  us  a  false  sense  of 
trust.  And  if  we  are  harmed,  our  loss  of 
productivity  and  capability  demands 
compensation  in  order  to  survive. 

Software  safety  analysis  as  a  tool 
results  in  a  safer  and  better  product  at  a 
cost  and  schedule  savings.  Early  involve¬ 
ment  is  critical  to  an  efficient  and  effec¬ 
tive  analysis  effort.  Software  require¬ 
ments  hazards  will  be  found  and 
removed  during  the  requirements  phase. 
Hazards  found  during  the  other  software 
development  phases  will  be  found  during 
the  correct  phase,  preventing  loss  of 


resources  and  schedule. 

Software  development  teams  want  to 
generate  a  quality  product,  but  are  hesi¬ 
tant  to  have  independent  activities  per¬ 
form  analysis  on  their  product.  A  change 
of  mindset  will  result  in  a  synergistic 
team  that  produces  a  superior  product. 
Development  engineers  will  be  able  to 
do  what  they  do  best  in  a  success-orient¬ 
ed  environment  within  their  resources 
and  schedules.  Software  safety  analysis 
engineers  will  provide  the  necessary 
checks  and  balances  that  result  in  a  supe¬ 
rior  product,  free  of  embedded  hazards. 
When  these  work  as  a  team,  software 
development  will  cost  less  and  be  pro¬ 
vided  on  schedule  in  our  world  of  con¬ 
tinuous  change  and  improvement.  ♦ 
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Three  Essential  Tools  for  Stable  Development^ 


Andy  Hunt  and  Dave  Thomas 

The  Tragmatic  Programmers ,  LLC 

Three  basic  practices  make  the  difference  between  a  software  project  that  succeeds  and  one  that  fails.  These  practices  support 
and  reinforce  each  other;  when  done  property,  they  form  an  interlocking  safety  net  to  help  ensure  success  and  prevent  common 
project  disasters.  However,  few  development  teams  in  the  United  States  use  these  proven  techniques,  and  even  fewer  use  them 
correctly. 


Many  software  projects  that  fail  seem 
to  fail  for  very  similar  reasons.  After 
observing  -  and  helping  -  many  of  these 
ailing  projects  over  the  past  couple  of 
decades,  it  seems  clear  to  us  that  a  majori¬ 
ty  of  common  problems  can  be  traced 
back  to  a  lack  of  three  very  basic  practices. 
Fortunately,  these  three  practices  are  easy 
and  relatively  inexpensive  to  adopt.  It  does 
not  require  a  large-scale,  expensive,  or 
bureaucratic  effort;  with  just  these  prac¬ 
tices  in  place,  your  team  can  work  at  top 
speed  with  increased  parallelism.  You  will 
never  lose  precious  work,  and  you  will 
know  immediately  when  the  development 
starts  to  veer  off-track  in  time  to  correct  it, 
cheaply  and  easily. 

The  three  basic  practices  that  we  have 
identified  as  being  the  most  crucial  are  ver¬ 
sion  control ,  unit  testing,  and  automation. 
Version  control  is  an  obvious  best  practice , 
yet  nearly  40  percent  of  software  projects 
in  the  United  States  do  not  use  any  form 
of  version  control  for  their  source  code 
files  [1].  The  motto  of  these  shops  seems 
to  be  last  one  in  wins.  That  is,  they  will  use  a 
shared  drive  of  some  sort  and  hope  that 
no  one  overwrites  their  changes  as  the 
software  evolves.  Hope  is  a  pretty  poor 
methodology,  and  these  teams  regularly 
lose  precious  work.  Developers  begin  to 
fear  making  any  changes  at  all,  in  case  they 
accidentally  make  the  system  worse.  Of 
course,  this  fear  becomes  a  self-fulfilling 
prophecy  as  necessary  changes  are  neglect¬ 
ed  and  the  system  begins  to  degrade. 

Unit  testing  is  a  coding  technique  for 
programmers  so  they  can  verify  that  the 
code  they  just  wrote  actually  does  some¬ 
thing  akin  to  their  intent.  It  may  or  may 
not  fulfill  the  requirements,  but  that  is  a 
separate  question:  If  the  code  does  not  do 
what  the  programmer  thought  it  did,  then 
any  further  testing  or  validation  is  both 
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meaningless  and  a  large  waste  of  time  and 
money  (two  items  that  are  in  short  supply 
to  begin  with).  Developer-centric  unit 
testing  is  a  great  way  to  introduce  basic 
regression  testing,  create  more  modular¬ 
ized  code  that  is  easier  to  maintain,  and 
ensure  that  new  work  does  not  break 
existing  work.  Despite  the  effectiveness  of 
this  technique  in  both  improving  design 
and  identifying  and  preventing  defects 
(aka  bugs),  76  percent  of  companies  in  the 
United  States  do  not  even  try  it  [2]. 

Automation  is  a  catchall  category  that 
includes  regular,  unattended  project  builds, 
including  regression  tests  and  push-button 
convenience  for  day-to-day  activities. 
Regular  builds  ensure  that  the  product  can 
be  built  to  catch  simple  mistakes  early  and 
easily,  when  fixing  them  is  the  cheapest. 
When  implemented  properly,  it  is  as  if  you 
have  an  ever- vigilant  guardian  looking  over 
your  shoulder,  warning  you  as  soon  as 
there  is  a  problem.  Incredibly,  some  70 
percent  of  projects  in  the  United  States  do 
not  have  any  sort  of  daily  build  [2].  By  the 
time  they  discover  a  problem,  it  has  metas¬ 
tasized  into  a  much  larger  and  potentially 
fatal  problem. 

We  will  briefly  examine  each  of  these 
areas,  with  an  in-depth  look  at  unit  testing 
in  particular.  We  will  outline  the  important 
ideas,  synergies,  and  caveats  for  each  of 
these  practices  so  your  team  can  either 
begin  using  them  or  improve  your  current 
use  of  them. 

Version  Control 

Everyone  can  agree  that  version  control  is 
a  best  practice  but  even  with  it  in  place,  is 
it  being  used  effectively?  Ask  yourself 
these  questions:  Can  you  re-create  your 
software  exactly  as  it  existed  on  January  8? 
When  a  bug  is  found  that  affects  multiple 
versions  of  your  released  software,  can 
your  team  fix  it  just  once,  and  then  apply 
that  fix  to  the  different  versions  automati¬ 
cally?  Can  a  developer  quickly  back  out  of 
a  bad  piece  of  code? 

There  is  more  to  version  control  than 
just  keeping  track  of  files.  But  before  we 


proceed,  we  need  to  define  some  simple 
terminology:  We  use  check-in  to  mean  that 
a  programmer  has  submitted  his  or  her 
changes  to  the  version  control  system.  We 
use  checkout  to  refer  to  getting  a  personal 
version  of  code  from  the  version  control 
system  into  a  local  working  area. 

When  a  programmer  checks  in  code,  it 
is  now  potentially  available  to  the  rest  of 
the  team.  As  such,  it  is  only  polite  to 
ensure  that  this  new  code  actually  compiles 
successfully;  it  should  be  accompanied  by 
unit  tests  (more  on  this  later),  and  those 
tests  should  pass.  All  the  other  passing 
tests  in  the  system  should  continue  to  pass 
as  well  —  if  they  suddenly  fail,  then  you  can 
easily  trace  the  failure  to  the  new  code  that 
was  introduced. 

It  is  far  easier  to  track  down  these  sort 
of  problems  right  at  the  point  of  creation 
instead  of  days,  weeks,  or  even  months 
later.  To  exploit  this  effect,  you  must  allow 
and  encourage  frequent  check-ins  of  code 
multiple  times  per  day.  It  is  not  unusual  to 
see  team  members  check-in  code  10-20 
times  a  day.  It  is  unusual  —  and  very  dan¬ 
gerous  —  to  allow  a  programmer  to  go  a 
few  days  or  a  week  or  more  without  check¬ 
ing  in  code. 

Because  check-ins  occur  so  frequently, 
these  and  other  day-to-day  operations 
must  be  very  fast  and  low  ceremony.  A 
check-in  or  checkout  of  code  should  not 
take  more  than  five  to  15  seconds  in  gen¬ 
eral.  If  it  takes  an  hour,  people  will  not  do 
it,  and  you  have  lost  the  advantage. 

Now  some  people  get  a  little  nervous 
when  they  read  this  part.  They  fret  that  all 
of  this  code  is  being  dumped  into  the  sys¬ 
tem  without  being  reviewed,  tested  by  QA, 
audited,  or  whatever  else  their  methodolo¬ 
gy  or  environment  demands.  They  are 
rightfully  concerned  that  this  code  is  not 
yet  ready  to  be  part  of  a  release. 
Nonetheless,  it  must  still  be  in  the  version 
control  system  so  that  it  is  protected. 

Most  version  control  systems  provide  a 
mechanism  to  differentiate  ongoing  devel¬ 
opment  changes  from  official  release  can¬ 
didates.  Some  feature  explicit  promotion 
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commands  to  allow  this.  You  can  accom¬ 
plish  the  same  thing  in  other  systems  by 
using  tags  (or  version  labels)  to  identify 
stable  release  versions  of  source  code  as 
opposed  to  code  that  is  in  progress. 

Regardless  of  the  mechanism,  it  must 
be  an  easy  operation  to  promote  develop¬ 
ment  changes  to  an  official  release  status. 
On  the  other  side  of  the  coin,  you  need  to 
be  able  to  back  out  changes  and  any  disas¬ 
trous  new  code  when  needed. 

Finally,  you  need  to  be  able  to  re-cre¬ 
ate  any  product  built  at  any  previous  point 
in  time.  This  ability  to  go  back  in  time  is 
crucial  for  effective  debugging  and  prob¬ 
lem  solving  (just  think  of  any  developer 
who  starts  a  discussion  with,  “Well,  it  used 
to  work”). 

Commercial  and  freely  available  ver¬ 
sion  control  systems  vary  in  complexity, 
features,  and  ease  of  administration.  But 
one  feature  in  particular  is  worth  examin¬ 
ing:  whether  it  supports  strict  locking  or 
optimistic  locking.  In  systems  under  strict 
locking,  only  one  person  can  edit  a  file  at 
a  time.  While  that  sounds  like  a  good  idea, 
it  turns  out  to  be  unduly  restrictive  in 
practice.  We  favor  the  Concurrent 
Version  System  <www.cvshome.org> 
described  in  [3]. 

You  may  find  you  can  increase  paral¬ 
lelism  and  efficiency  in  your  team  by  using 
a  system  that  features  optimistic  locking. 
In  these  systems,  multiple  people  can  edit 
the  same  source  code  file  simultaneously. 
The  system  uses  conflict-resolution  algo¬ 
rithms  to  merge  the  disparate  changes 
together  in  a  sensible  manner.  Ninety-nine 
percent  of  the  time  it  works  perfectly  with¬ 
out  intervention.  Occasionally,  however, 
there  is  a  conflict  that  must  be  addressed 
manually.  At  no  point  is  anyone’s  work  in 
danger  of  being  lost,  and  it  ends  up  being 
much  more  efficient  to  coordinate  just 
these  few  conflicts  by  hand  instead  of  hav¬ 
ing  everyone  coordinate  every  change  with 
the  rest  of  the  team. 

Unit  Testing 

When  a  developer  makes  a  change  to  the 
code  on  your  project,  what  feedback  is 
available?  Does  the  developer  have  any 
way  of  knowing  if  the  new  code  broke 
anything  else?  Better  still,  how  do  you  know 
if  any  developer  has  broken  anything 
today?  A  system  of  automated  unit  tests 
will  give  you  this  information  in  real-time. 

Programming  languages  are  notorious 
for  doing  exactly  what  programmers  say, 
not  what  they  mean.  Like  a  petulant  child 
that  takes  your  expressions  completely  lit¬ 
erally,  the  computer  follows  our  instruc¬ 
tions  to  the  letter,  with  no  regard  at  all  to 
our  intent.  Technology  has  yet  to  produce 


the  compiler  that  implements  with  do  what 
I  mean,  not  what  l  say. 

So  in  keeping  with  the  idea  of  finding 
and  fixing  problems  as  soon  as  they  occur, 
you  want  programmers  to  use  unit  tests  (or 
checked  examples)  to  verify  the  computer’s 
literal  interpretation  of  their  commands.  It 
is  really  no  different  from  following 
through  with  a  subordinate  to  verify  that  a 
delegated  task  was  performed  —  except 
that  instead  of  just  checking  once,  auto¬ 
mated  unit  tests  will  check  and  recheck 
every  time  any  code  is  changed. 

There  are  some  requirements  to  using 
this  style  of  development,  however: 

•  The  code  base  must  be  decoupled 
enough  to  allow  testing.  When  code  is 
tightly  coupled,  it  is  very  difficult  to 
test  individual  pieces  in  isolation,  and 
harder  to  devise  unit  tests  that  exercise 
specific  areas  of  functionality.  Well- 
written  code,  on  the  other  hand,  is  easy 
to  test.  If  your  team  finds  that  the  code 
is  difficult  to  test,  then  take  that  as  a 
warning  sign  that  the  code  is  in  serious 
trouble  to  begin  with. 

•  Only  check-in  tested  code.  As  we  men¬ 
tioned  above,  checking-in  foists  a  pro¬ 
grammer’s  code  onto  the  rest  of  the 
team.  Once  it  is  available  to  everyone, 
then  the  whole  team  will  begin  to  rely 
on  it.  Because  of  this  reliance,  all  code 
that  is  checked  in  must  pass  its  own 
tests. 

•  In  addition  to  passing  its  own  tests,  the 
programmer  checking  in  the  code  must 
ensure  nothing  else  breaks,  either.  This 
simple  regression  helps  prevent  that 
frustrating  feeling  of  one  step  forward,  two 
steps  back  that  becomes  commonplace 
when  code  fixes  cause  collateral  dam¬ 
age  to  other  parts  of  the  code  base. 
Usually  these  bugs  then  require  fixes, 
which  in  turn  cause  more  damage,  and 
so  on.  The  discipline  of  keeping  all  the 
tests  running  all  the  time  prevents  that 
particular  death- spiral. 

•  There  should  be  at  least  as  much  test 
code  as  production  code.  You  might 
think  that  is  excessive,  but  it  is  really 
just  a  question  of  where  the  value  of 
the  system  resides.  We  firmly  believe 
the  code  that  implements  the  system  is 
not  where  the  value  of  your  intellectu¬ 
al  property  lies.  Code  can  be  rewritten 
and  replaced,  and  the  new  code  (even 
an  entirely  new  system)  can  be  verified 
against  the  existing  tests.  Now  the 
most  precise  specification  of  the  sys¬ 
tem  is  in  executable  form  —  the  unit 
tests.  The  learning  and  experience  that 
goes  into  creating  the  unit  tests  is 
invaluable,  and  the  tests  themselves  are 
the  best  expression  we  have  of  that 


knowledge. 

We  will  look  at  implementing  unit  tests 
(aka  checked  examples)  in  much  greater 
detail  later  in  this  article. 

Automation 

An  old  saying  goes  the  cobbler’s  children  have 
no  shoes.  This  saying  is  particularly  appro¬ 
priate  for  our  use  of  software  tools  during 
software  development.  We  see  teams  rou¬ 
tinely  waste  time  using  manual  procedures 
that  could  easily  be  automated. 

Everyone  clamors  for  software  devel¬ 
opment  to  be  more  defined  and  repeat- 
able.  Well,  the  design  and  implementation  of 
software  probably  cannot  be  made  repeat- 
able  any  more  than  you  could  make  the 
process  of  making  hit  movies  repeatable. 
But  the  production  of  software  is  another 
matter  entirely. 

The  process  of  taking  source  code 
files,  bits  of  extensible  Markup  Language, 
libraries,  and  other  resources  and  produc¬ 
ing  an  executable  for  the  end  user  should 
be  precisely  repeatable.  Given  the  same 
inputs,  you  want  the  same  outputs,  every 
time,  without  excuses.  In  combination 
with  version  control,  you  want  to  be  able 
to  go  back  in  time  and  reproduce  that 
same  pile  of  bits  that  you  would  have  pro¬ 
duced  on  January  8  just  as  easily.  That 
comes  in  very  handy  should  the 
Department  of  Justice  ask  for  it  politely,  or 
a  frustrated  customer  asks  for  it  somewhat 
less  politely  to  work  around  some  out¬ 
standing  bug. 

The  rule  we  try  to  adopt  is  that  any 
manual  process  that  is  repeated  twice  is 
likely  to  be  repeated  a  third  time  -  or  more 
—  so  it  needs  to  be  encapsulated  within  a 
shell  script,  batch  file,  piece  of  Java  code, 
Job  Control  Language,  or  whatever. 

Unit  tests,  as  well  as  functional  and 
acceptance  tests,  should  be  run  automati¬ 
cally  as  well  as  be  part  of  the  build  process. 
You  will  probably  want  to  run  the  unit 
tests  (which  should  execute  very  quickly) 
with  every  build;  automatic  functional  and 
acceptance  tests  might  take  longer  and  you 
may  only  want  to  run  those  once  a  week, 
or  when  convenient. 

You  see,  not  only  does  automation 
make  developer’s  lives  easier  by  providing 
push-button  convenience,  it  helps  keep  the 
feedback  coming  by  constantly  checking 
the  state  of  the  software.  Automated 
builds  are  constantly  asking  two  questions: 
Does  the  software  build  correctly?  Do  all 
the  tests  still  pass  a  basic  regression?  With 
the  computer  performing  these  checks 
regularly,  developers  do  not  have  to. 
Problems  can  be  identified  as  soon  as  they 
happen,  and  the  appropriate  developer  or 
team  lead  can  be  notified  immediately  of 
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the  problem  [4].  Problems  can  be  fixed 
quickly,  before  they  have  a  chance  to  cause 
any  additional  damage.  That  is  the  benefit 
we  want  from  automation. 

Finally,  consider  how  the  build  com¬ 
municates  to  the  development  team  and  its 
management.  Does  the  team  lead  look  at 
the  latest  results  in  some  log  file  and  then 
report  status  to  management?  Does  not 
that  constitute  a  manual  process?  It  is  rela¬ 
tively  easy  to  set  up  visual  display  devices, 
ranging  from  liquid  crystal  display  screens 
to  bubbling  lava- style  lamps  to  the  new 
and  popular  Ambient  Orb  [4] . 

Synergy 

These  three  practices  interlock  to  provide  a 
genuine  safety  net  for  developers.  Version 
control  is  the  foundation.  Unit  tests  and 
scripts  for  automation  are  under  version 
control,  but  version  control  needs  automa¬ 
tion  to  be  effective.  Unit  testing  needs  both 
version  control  and  automation. 

With  the  combination,  developers  can 
better  afford  to  take  chances,  experiment, 
and  find  the  best  solutions.  The  Rule  of 
Three  says  that  if  you  have  not  proposed 
at  least  three  solutions  to  a  problem  then 
you  have  not  thought  about  it  hard 
enough.  With  this  set  of  practices  in  place, 
developers  can  realistically  try  out  a  num¬ 
ber  of  different  solutions  to  a  problem: 
Version  control  will  keep  them  separate, 
and  unit  testing  will  help  confirm  the  via¬ 
bility  of  each  solution.  All  this  with  plenty 
of  automated  support,  including  continu¬ 
ous,  ongoing  checks  ensures  that  the  team 
does  not  wander  too  far  off  into  the 
woods.  This  is  how  modern,  successful 
software  development  is  done. 

Unit  Testing  With  Your 
Right-BICEP 

You  can  strengthen  your  organization’s 
testing  skills  by  looking  at  six  specific  areas 
of  code  that  may  need  unit  tests.  These 
areas  are  remembered  easily  using  the 
mnemonic  Right-BICEP  [5]: 

Right  Are  the  results  right? 

B  Are  all  the  boundary  conditions  cor¬ 
rect? 

I  Can  you  check  inverse  relation¬ 

ships? 

C  Can  you  cross-check  results  using 
other  means? 

E  Can  you  force  error  conditions  to 

happen? 

P  Are  performance  characteristics 

within  bounds? 

Are  the  Results  Right? 

The  first  and  most  obvious  area  to  test  is 


simply  to  see  if  the  expected  results  are 
right  -  to  validate  the  results.  These  are 
usually  the  easy  tests,  as  they  represent  the 
answer  to  the  key  question:  If  the  code  ran 
correctly,  how  would  I  know?  Here  is  an 
example  of  how  being  forced  to  think 
about  testing  helps  developers  code  better: 
If  this  question  cannot  be  answered  satis¬ 
factorily,  then  writing  the  code  —  or  the  test 
-  may  be  a  complete  waste  of  time. 

“But  wait,”  you  cry  out,  “that  does  not 
sound  very  agile!  What  if  the  requirements 
are  vague  or  incomplete?  Does  that  mean 
we  can’t  write  code  until  all  the  require¬ 
ments  are  firm?”  No,  it  does  not  at  all.  If 
the  requirements  are  truly  not  yet  known, 
or  not  yet  complete,  you  can  always  make 
some  assumptions  as  a  stake  in  the  ground. 
They  may  not  be  correct  from  the  user’s 
point  of  view  (or  anyone  else  on  the  plan¬ 
et),  but  they  let  the  team  continue  to  devel¬ 
op.  And,  because  you  have  written  a  test 
based  on  your  assumption,  you  have  now 
documented  it  —  nothing  is  implicit. 

Of  course,  you  must  then  arrange  for 
feedback  with  users  or  sponsors  to  fine- 
tune  your  assumptions.  The  definition  of 
correct  may  change  over  the  lifetime  of  the 
code  in  question,  but  at  any  point,  you 
should  be  able  to  prove  that  it  is  doing 
what  you  think  it  ought. 

Boundary  Conditions 

Identifying  boundary  conditions  is  one  of 
the  most  valuable  parts  of  unit  testing 
because  this  is  where  most  bugs  generally 
live  -  at  the  edges.  Some  conditions  you 
might  want  to  think  about  include  the  fol¬ 
lowing: 

•  Totally  bogus  or  inconsistent  input  val¬ 
ues  such  as  a  file  name  of 
!*W:X\\  {\&Gi/ w$>|g/h\#WQ@. 

•  Badly  formatted  data  such  as  an  e-mail 
address  without  a  top-level  domain 
<fred@foobar>. 

•  Empty  or  missing  values  such  as  0,  0.0, 
“”,  or  null. 

•  Values  far  in  excess  of  reasonable 
expectations  such  as  a  person’s  age  of 
10,000  years. 

•  Duplicates  in  lists  that  should  not  have 
duplicates. 

•  Ordered  lists  that  are  not  in  order  and 
vice-versa.  Try  handing  a  pre-sorted  list 
to  a  sort  algorithm,  for  instance,  or 
even  a  reverse-sorted  list. 

•  Things  that  arrive  out  of  order,  or  hap¬ 
pen  out  of  expected  order  such  as  try¬ 
ing  to  print  a  document  before  logging 
in,  for  instance. 

An  easy  way  to  think  of  possible 
boundary  conditions  is  to  remember  the 
acronym  CORRECT.  For  each  of  these 
items,  consider  whether  or  not  similar  con¬ 


ditions  may  exist  in  your  method  that  you 
want  to  test,  and  what  might  happen  if 
these  conditions  were  violated  [4]: 

•  Conformance.  Does  the  value  con¬ 
form  to  an  expected  format? 

•  Ordering.  Is  the  set  of  values  ordered 
or  unordered  as  appropriate? 

•  Range.  Is  the  value  within  reasonable 
minimum  and  maximum  values? 

•  Reference.  Does  the  code  reference 
anything  external  that  is  not  under 
direct  control  of  the  code  itself? 

•  Existence.  Does  the  value  exist  (e.g.,  is 
non-null,  non-zero,  present  in  a  set, 
etc.)? 

•  Cardinality.  Are  there  exactly  enough 
values? 

•  Time  (absolute  and  relative).  Is 

everything  happening  in  order?  At  the 
right  time?  In  time? 

Check  Inverse  Relationships 

Some  methods  can  be  checked  by  applying 
their  logical  inverse.  For  instance  develop¬ 
ers  might  check  a  method  that  calculates  a 
square  root  by  squaring  the  result,  and  test¬ 
ing  that  it  is  tolerably  close  to  the  original 
number.  They  might  also  check  that  some 
data  was  successfully  inserted  into  a  data¬ 
base  by  then  searching  for  it,  and  so  on. 

Be  cautious  when  the  same  person  has 
written  both  the  original  routine  and  its 
inverse,  as  some  bugs  might  be  masked  by 
a  common  error  in  both  routines.  Where 
possible,  use  a  different  source  for  the 
inverse  test.  In  the  square  root  example, 
we  might  use  regular  multiplication  to  test 
our  method.  For  the  database  search,  we 
will  probably  use  a  vendor-provided  search 
routine  to  test  our  insertion. 

Cross-Check  Using  Other  Means 

Developers  might  also  be  able  to  cross¬ 
check  results  of  their  method  using  differ¬ 
ent  means.  Usually  there  is  more  than  one 
way  to  calculate  some  quantity;  we  might 
pick  one  algorithm  over  the  others  because 
it  performs  better  or  has  other  desirable 
characteristics.  That  is  the  one  we  will  use 
in  production,  but  we  can  use  one  of  the 
other  versions  to  cross-check  our  results  in 
the  test  system.  This  technique  is  especial¬ 
ly  helpful  when  there  is  a  proven,  known 
way  of  accomplishing  the  task  that  hap¬ 
pens  to  be  too  slow  or  too  inflexible  to  use 
in  production  code. 

Another  way  of  looking  at  this  is  to  use 
different  pieces  of  data  from  the  code 
itself  to  make  sure  they  all  add  up.  For 
instance,  suppose  you  had  some  sort  of 
system  that  automated  a  lending  library.  In 
this  system,  the  number  of  copies  of  a 
particular  book  should  always  balance. 
That  is,  the  number  of  copies  that  are 
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checked  out  plus  the  number  of  copies  sit¬ 
ting  on  the  shelves  should  always  equal  the 
total  number  of  copies  in  the  collection. 
These  are  separate  pieces  of  data,  and  may 
even  be  managed  by  different  pieces  of 
code,  but  they  still  have  to  agree  and  so  can 
be  used  to  cross-check  one  another. 

Force  Error  Conditions 

In  the  real  world,  errors  happen.  Disks  fill 
up,  network  lines  drop,  e-mail  goes  into  a 
black  hole,  and  programs  crash.  You 
should  be  able  to  test  that  code  handles  all 
of  these  real-world  problems  by  forcing 
errors  to  occur. 

That  is  easy  enough  to  do  with  invalid 
parameters  and  the  like,  but  to  simulate 
specific  network  errors  —  without  unplug¬ 
ging  any  cables  -  takes  some  special  tech¬ 
niques,  including  using  mock  objects. 

In  movie  and  television  production, 
crews  will  often  use  stand-ins,  or  doubles, 
for  the  real  actors.  In  particular,  while  the 
crews  are  setting  up  the  lights  and  camera 
angles,  they  will  use  lighting  doubles :  inex¬ 
pensive,  unimportant  people  who  are 
about  the  same  height  and  complexion  as 
the  very  expensive,  important  actors  who 
remain  safely  lounging  in  their  luxurious 
trailers. 

The  crew  then  tests  their  setup  with 
the  lighting  doubles,  measuring  the  dis¬ 
tance  from  the  camera  to  the  stand-in’s 
nose,  adjusting  the  lighting  until  there  are 
no  unwanted  shadows,  and  so  on,  while 
the  obedient  stand-in  just  stands  there  and 
does  not  whine  or  complain  about  lacking 
motivation  for  their  character  in  this  scene. 

What  you  can  do  in  unit  testing  is  sim¬ 
ilar  to  the  use  of  lighting  doubles  in  the 
movies:  Use  a  cheap  stand-in  that  is  kind 
of  close  to  the  real  thing,  at  least  superfi¬ 
cially,  but  that  will  be  easier  to  work  with 
for  your  purposes. 

Performance  Characteristics 

One  area  that  might  prove  beneficial  to 
examine  is  performance  characteristics  — 
not  performance  itself,  but  trends  as  input 
sizes  grow,  as  problems  become  more 
complex,  and  so  on.  Why?  We  have  all 
experienced  applications  that  work  fine  for 
a  year  or  so,  but  suddenly  and  inexplicably 
slow  to  a  crawl.  Often,  this  is  the  result  of 
a  silly  error  or  oversight:  A  database 
administrator  changed  the  indexing  struc¬ 
ture  in  the  database,  or  a  developer  typed 
an  extra  zero  into  a  loop  counter. 

What  we  would  like  to  achieve  is  a 
quick  regression  test  of  performance 
characteristics.  We  want  to  do  this  regular¬ 
ly,  every  day  at  least,  so  that  if  we  have 
inadvertently  introduced  a  performance 
problem  we  will  know  about  it  sooner 


rather  than  later  (because  the  nearer  in 
time  you  are  to  the  change  that  introduced 
the  problem,  the  easier  it  is  to  work 
through  the  list  of  things  that  may  have 
caused  that  problem). 

So,  to  avoid  shipping  software  with 
unsuspected  performance  problems, 
teams  should  consider  writing  some  rough 
tests  just  to  make  sure  that  the  perfor¬ 
mance  curve  remains  stable.  For  instance, 
suppose  the  team  is  working  on  a  compo¬ 
nent  that  lets  users  browse  the  Web  from 
within  their  application.  Part  of  the 
requirement  is  to  filter  out  access  to  Web 
sites  that  we  wish  to  block.  The  code 
works  fine  with  a  few  dozen  sample  sites, 
but  will  it  work  as  well  with  10,000? 
100,000?  We  can  write  a  unit  test  to  find 
out. 

This  gives  us  some  assurance  that  we 
are  still  meeting  performance  targets.  But 
because  this  one  test  takes  six  to  seven 
seconds  to  run,  we  may  not  want  to  run  it 
every  time.  As  long  as  we  run  it  (say) 
nightly,  we  will  quickly  be  alerted  to  any 
problems  we  may  introduce  while  there  is 
still  time  to  fix  them. 

Getting  Started 

All  of  the  software  tools  mentioned  in  this 
article  are  freely  available  on  the  Web.  To 
get  started  using  these  practices  effectively, 
we  recommend  following  this  sequence: 

1 .  Get  everything  into  version  control. 

2.  Arrange  for  automatic,  daily  builds. 

Increase  these  to  multiple  times  per  day 

or  continuously  as  soon  as  the  process 

begins  to  work  smoothly. 

3.  Start  writing  unit  tests  for  new  code. 


Where  needed,  add  some  unit  tests  to 
existing  code  (but  be  pragmatic  about 
it;  only  add  tests  if  they  will  really  help, 
not  just  for  the  sake  of  completeness). 

4.  Add  the  unit  tests  to  the  scheduled 
builds. 

You  can  begin  right  away.  Fire  up  that 
Web  browser  and  start  downloading  some 
software  if  you  do  not  already  have  it. 
These  ideas  will  not  fix  all  the  problems 
on  your  project,  of  course,  but  they  will 
provide  your  project  with  a  firm  footing 
so  you  can  concentrate  on  the  truly  diffi¬ 
cult  problems.^ 
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Web  Sites 


The  Agile  Alliance 

www.agilealliance.com/home 
The  Agile  Alliance  is  a  non-profit  organi¬ 
zation  dedicated  to  promoting  the  con¬ 
cepts  of  agile  software  development,  and 
helping  organizations  adopt  those  con¬ 
cepts.  The  site  features  an  extensive 
library  of  articles  about  agile  processes 
and  agile  development. 

COTS  Journal 

www.cotsjournalonline.com 
COTS  Journal  is  a  technology-in-context 
magazine  that  looks  at  any  embedded 
technology  anywhere  it  exists.  Its  editors 
assess  the  applicability  of  the  world  s  best 
embedded  research  and  standards, 
methodologies,  and  products  for  govern¬ 
ment,  military,  and  aerospace  applica¬ 
tions.  COTS  Journal  provides  the  indus¬ 
try  with  technical  material  to  help  readers 
design  and  build  embedded  computers 
for  the  military  -  whether  for  benign 
applications  or  for  the  most  rugged,  mis¬ 
sion-critical  jobs. 

Concurrent  Versions 
System 

www.cvshome.org 

This  site  is  dedicated  to  supporting  the 
community  around  the  Concurrent 
Versions  System  (CVS).  The  CVS  is  the 
dominant  open-source  network-trans¬ 
parent  version  control  system.  CVS  is 
useful  for  everyone  from  individual 
developers  to  large,  distributed  teams  for 
the  following: 

•  Its  client-server  access  method  lets 
developers  access  the  latest  code  from 
anywhere  there  is  an  Internet  connec¬ 
tion. 

•  Its  unreserved  checkout  model  to  ver¬ 
sion  control  avoids  artificial  conflicts 
common  with  the  exclusive  checkout 
model. 

Control  Chaos.com 

http://controlchaos.com 
Control  Chaos.com  is  home  to  Scrum, 
an  agile,  lightweight  process  that  can  be 
used  to  manage  and  control  software  and 
product  development  using  iterative, 
incremental  practices.  Wrapping  existing 
engineering  practices,  including  eXtreme 


Programming  and  RUP,  Scrum  generates 
the  benefits  of  agile  development  with 
the  advantages  of  a  simple  implementa¬ 
tion.  Scrum  significantly  increases  pro¬ 
ductivity  and  reduces  time  to  benefits 
while  facilitating  adaptive,  empirical  sys¬ 
tems  development.  Advanced  Develop¬ 
ment  Methods,  Inc.  maintains  this  Web 
site  to  provide  information,  news,  refer¬ 
ences,  and  a  cookbook  description  of 
Scrum. 

Air  Force  Research 

Laboratory/Information 

Directorate 

www.rl.af.mil 

The  Air  Force  Research  Laboratory/ 
Information  Directorate  (AFRL/IF)  is  a 
confluence  of  information  specialists, 
electrical  and  computer  engineers,  com¬ 
puter  scientists,  mathematicians,  physi¬ 
cists,  and  a  supporting  staff.  The 
AFRL/IF  develops  systems,  concepts, 
and  technologies  to  enhance  the  Air 
Force  s  capability  to  successfully  meet  the 
challenges  of  the  information  age.  It 
develops  and  integrates  programs  to 
acquire  data  and  to  find  better  ways  to 
store,  process,  and  fuse  data  to  make  it 
into  information.  The  AFRL/IF  creates 
the  means  to  deliver  and  present  tailored 
information  to  allow  the  military  deci¬ 
sion  maker  to  have  the  total  sphere  of 
information  needed  for  successful  opera¬ 
tions  worldwide. 

MIT  System  Safety  and 
Software  Safety  Research 

http:/  /  sunnyday.mit.edu/ safety.html 
The  goal  of  the  Massachusetts  Institute 
of  Technology  s  (MIT)  System  Safety  and 
Software  Safety  Research  project  is  to 
develop  a  theoretical  foundation  for  safe¬ 
ty  and  a  methodology  for  building  safety- 
critical  systems  built  upon  that  founda¬ 
tion.  The  methodology  includes  special 
management  structures  and  procedures, 
system  hazard  analysis,  software  hazard 
analysis,  requirements  modeling  and 
analysis  for  completeness  and  safety, 
design  for  safety,  design  of  human- 
machine  interaction,  verification  (both 
testing  and  code  analysis),  operational 
feedback,  and  change  analysis. 
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Your  Quality  Data  Is  Talking  -  Are  You  Listening? 


David  B.  Putman 

Ogden-Air  Logistics  Center 

The  transition  from  defect  detection  and  removal  activities  to  defect  prevention  activities  map  not  be  as  smooth  as  you 
would  like.  You  may  start  asking,  ‘ Where  do  I  start?”  Or,  you  may  have  the  feeling  that  you  are  not  getting  much 
benefit  from  your  defect  prevention  activities.  You  may  also  find  yourself  faced  with  a  need  to  explore,  evaluate,  and 
adopt  new  metrics.  This  article  discusses  some  quantitative  ( non-statistical  process  control  [SPC])  methods  for  looking 
at  your  data;  l  will  show  the  results  of  applying  SPC  to  the  same  information,  and provide  a  few  ((what  next”  options. 

The  intent  of  this  article  is  to  provide  process  improvement  team  members,  program  managers,  and  supervisors  with 
ideas  for  defect  prevention  metrics  to  help  them  identify  and  analyse  problem  areas  and  to  help  them  prioritise  and 
plan  their  defect  prevention  activities.  I  have  chosen  to  avoid  discussing  complex  mathematical  algorithms  in  favor  of 
providing  charts  to  aid  the  reader  in  participating  in  brainstorming  activities  to  identify  metrics  they  will  find  useful  for 
their  situation. 


You  know  it  is  impossible  to  fix 
every  problem  at  once  so  you 
review  the  defect  information  looking 
for  something  that  will  jump  out  and 
say,  “Fix  me.”  During  your  review  of 
the  data,  you  find  an  item  that  grabs 
your  attention.  You  are  confident  that 
you  can  reduce  type  xys  defects  by  90 
percent  simply  by  providing  the  organi¬ 
zation  with  an  annual  eight-hour 
refresher- training  course.  You  estimate 
that  it  will  cost  $20,000  to  develop  a  for¬ 
mal  training  course,  and  you  get  man¬ 
agement  approval  to  implement  the 
idea.  A  few  weeks  later,  you  provide  the 
first  eight-hour  training  course  to  a 
team  of  50  employees. 

Six  months  later  you  analyze  the  data 
and,  to  your  credit,  you  exceeded  your 
goal:  Type  xyz  defects  were  reduced  by 
95  percent.  Unfortunately,  you  learn 
that  your  savings  in  development  and 
rework  costs  is  significantly  less  than 
the  annual  costs  for  the  training.  You 
also  realize  that  all  type  xyz  defects  were 
detected  internally  and  none  were  ever 
released  to  the  customer.  In  order  to 
maintain  your  integrity,  you  brief  man¬ 
agement  of  your  findings  and  recom¬ 
mend  discontinuing  the  annual  eight- 
hour  training  course. 

You  cannot  try  to  solve  every  type  of 
defect  at  once  so  clearly  you  need  a  way 
of  prioritizing  your  efforts.  You  also 
need  a  way  of  evaluating  the  possible 
solutions  (cost  versus  benefit)  to  deter¬ 
mine  the  most  effective  solution.  This 
article  is  aimed  at  giving  the  reader 
some  ideas  on  what  type  of  defect 
information  should  be  captured,  and 
ways  to  present  that  data.  Armed  with 
the  proper  information,  a  defect  pre¬ 
vention  team  will  be  able  to  prioritize  its 
efforts,  evaluate  the  effectiveness  of  the 


proposed  solutions,  and  determine  the 
proper  corrective  action. 

Quantitative  (Non-Statistical 
Process  Control)  Data  Analysis 

As  our  Software  Engineering  Division 
at  the  Ogden-Air  Logistics  Center 
increased  its  focus  on  defect  prevention 
activities,  the  Extended  Software 
Engineering  Process  Group  (ESEPG) 
found  that  it  was  not  receiving  much 
utility  from  its  existing  quality  metrics. 
At  the  request  of  the  ESEPG,  I  began 
analyzing  its  data  in  an  effort  to  recom¬ 
mend  some  potential  metrics  that  would 
facilitate  defect  prevention  activities. 

In  my  data  analysis,  I  explored  a  vari¬ 
ety  of  ways  to  show  the  data  in  order  to 
provide  the  ESEPG  with  the  ability  to 
prioritize  its  efforts.  Our  group  had  col¬ 
lected  a  vast  amount  of  information,  so 
the  first  task  was  to  develop  appropriate 
filters  to  give  me  a  better  ability  to 


extract  the  data  in  a  manner  that  would 
facilitate  the  analysis.  My  first  look  at 
the  information  was  by  the  category  and 
severity  of  the  defect  as  shown  in 
Figure  1.  If  defects  of  a  high  severity 
were  getting  through  the  process,  then 
this  would  be  a  logical  starting  point  for 
defect  prevention  activities. 

As  seen  in  Figure  1,  almost  all  of  the 
recent  defects  were  identified  as  being  a 
minor  severity.  At  this  point,  I  changed 
the  filters  to  extract  the  information  for 
18  different  categories  and  types  of 
defects,  and  then  again  for  19  different 
categories  and  locations  for  the  defects. 
Table  1  (page  28)  provides  an  example 
of  how  each  defect  is  characterized  by 
category,  type,  and  location. 

The  documentation  defects  analysis 
showed  that  typographical  errors  in  the 
engineering  documentation  used  to 
maintain  the  product  were  the  most 
common  defect  type  found  during  peer 
reviews.  I  then  began  to  perform  a  sim- 


Figure  1 :  Quantity  of  Defects  By  Severity  and  Category 
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Category 

Type 

Location 

Software 

Syntax 

Source  Code 

Software 

Typographical 

Source  Code  (e.g.,  comment) 

Documentation 

Typographical 

User's  Manual 

Documentation 

Typographical 

Customer  Product  Acceptance  Form 

Table  1 :  Defects  Characterised  By  Category,  Type,  and  location 


Quantity  of  Defects 


Software  Software  Software  Software  Software  Document 
Execution  Flow  Logic  Style  Guide  Syntax  Typo 


Defect  Type 

Figure  2:  Teel  Back  -  Quantities  of  Some  of  the  Defect  Types 


ilar  analysis  on  the  software  defects 
using  the  same  type  of  metrics  devel¬ 
oped  for  documentation  defects.  Too 
much  information  on  a  chart  can  make 
it  difficult  to  understand,  so  to  keep  the 
information  presentable,  the  documen¬ 
tation  metrics  were  displayed  on  one 
chart  and  the  software  metrics  on 
another.  A  few  items  from  both  cate¬ 
gories  were  selected  to  display  on  the 
chart  shown  in  Figure  2. 

The  information  shown  in  Figure  2 
can  be  used  quite  easily  to  convince  a 


defect  prevention  team  that  they  need 
to  jump  in  and  begin  taking  action  to 
reduce  the  number  of  typographical 
errors.  But  the  information  presented 
so  far  does  not  answer  the  question,  “Is 
working  the  typographical  errors  the 
best  use  of  our  time?”  To  answer  this,  I 
developed  a  chart  similar  to  the  one 
shown  in  Figure  3. 

Figure  3  shows  an  example  of  the 
rework  costs;  this  chart  was  developed 
to  enable  an  easy  comparison  between 
Figures  2  and  3.  Presenting  and  compar¬ 


ing  the  information  in  this  manner  (as 
shown  in  Figures  1-3)  is  a  method  that 
you  may  want  to  consider  to  help  prior¬ 
itize  your  defect  prevention  activities. 

Applying  Statistical  Process 
Control 

Knowing  the  information  discussed  ear¬ 
lier,  many  teams  may  think,  “We  know 
everything  that  we  need  to  know.  What 
can  statistical  process  control  (SPC)  tell 
us  that  we  don’t  already  know?”  To  start 
with,  the  information  shown  in  Figures 
1-3  does  not  identify  whether  or  not  the 
process  is  under  control,  and  the  charts 
do  not  identify  random  events  versus 
non-random  events.  Non-random 
events  can  be  assigned  to  specific  caus¬ 
es,  which  you  may  be  able  to  prevent  or 
take  into  future  consideration  as  a  risk. 

At  least  seven  watchfor  indicators 
have  been  identified  as  events  that  can 
be  assigned  to  a  cause;  they  have  a  very 
low  probability  of  being  random  in 
nature.  These  watch-for  indicators 
include  the  following: 

•  One  or  more  points  above  the  upper 
natural  process  limit  (UNPL)  or 
below  the  lower  natural  process  limit 
(LNPL). 

•  Seven  or  more  consecutive  points  on 
one  side  of  the  center  line. 

•  Six  or  more  points  in  a  row  steadily 
increasing  or  decreasing. 

•  Fourteen  points  in  a  row  alternating 
up  and  down. 

•  Two  out  of  three  consecutive  points 
in  the  outer  third  of  the  control 
region. 

•  Fifteen  or  more  points  in  a  row 
within  the  center  one-third  region  of 
the  chart. 

•  Eight  or  more  points  on  both  sides 
of  the  control  chart  with  none  in  the 
center  one-third  region  of  the  chart. 
Using  the  same  data,  I  generated  the 

Sample  (X)  and  moving  Range  (XmR) 
Control  Charts  for  the  total  number  of 
defects  found  during  each  peer  review. 
The  Sample  (X)  run  chart  is  shown  in 
Figure  4. 

The  LNPL  shown  in  Figure  4  was 
not  allowed  to  go  below  zero  because  it 
is  impossible  to  have  a  negative  number 
of  findings.  As  can  be  seen  in  Figure  4, 
only  one  anomaly  occurred  where  the 
number  of  peer  review  findings  exceed¬ 
ed  the  UNPL. 

I  was  concerned  that  by  including  all 
defect  types  in  the  run  chart,  I  was 
masking  defects  that  could  be  assigned 
to  a  cause.  I  then  developed  individual 
XmR  charts  for  18  different  types  of 


Figure  3:  Rework  Costs  per  Defect  Type 
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defects  and  for  19  different  defect  loca¬ 
tions  (okay,  so  I  need  a  life).  Peeling 
back  the  data  and  looking  at  the  specif¬ 
ic  defects  revealed  an  additional  18 
anomalies  where  the  quantity  exceeded 
the  UNPL.  Figure  5  shows  one  of  these 
additional  charts,  which  in  this  case 
there  were  five  instances  in  which  the 
quantity  of  defects  exceeded  the  UNPL. 

The  result  of  this  effort  identified  a 
total  of  19  anomalies1  in  which  the 
quantity  of  defects  exceeded  the  UNPL. 
As  I  started  looking  at  each  anomaly,  a 
common  attribute  appeared  in  the  data. 
All  19  anomalies  pointed  back  to  one 
small2  highly  skilled  team  working  on  a 
project  in  which  the  original  proposal 
was  too  optimistic  and  based  upon  an 
unproven  technology.  The  project 
quickly  went  over  schedule  as  soon  as 
the  unproven  technology  failed  to  meet 
or  exceed  the  anticipated  productivity. 
The  team  was  under  a  lot  of  pressure 
from  both  the  customer  and  manage¬ 
ment  to  bring  the  project  back  on 
schedule.  The  harder  the  team  tried  to 
bring  the  project  back  on  schedule,  the 
louder  the  voice  of  the  process  became. 

As  I  further  analyzed  the  project’s 
data,  I  started  using  this  analogy: 
putting  three  valves  on  the  end  of  a  gar¬ 
den  hose  does  not  increase  the  flow  of 
the  water  through  the  hose.  The  process 
capability  was  limited  by  constraints 
within  the  process  such  as  manpower, 
equipment  availability,  and  equipment 
throughput.  In  essence,  the  process 
capability  resisted  heroic  efforts  to 
bring  the  project  back  into  the  contract 
schedule.  When  the  employees  tried  to 
rush  through  their  own  personal  quality 
checks,  they  were  met  with  higher 
defect  rates  found  during  the  peer 
reviews. 

SPC  Versus  Non-SPC 

The  following  is  a  comparison  of  the 
two  methods  of  quantitative  analysis. 

Non-SPC 

The  benefit  of  quantitative  non-SPC 
types  of  metrics  is  simplicity.  The  met¬ 
rics  and  charts  may  seem  easier  to  devel¬ 
op,  the  metrics  may  take  less  time  to 
develop,  and  the  audience  may  find 
these  charts  a  lot  easier  to  understand. 
Depending  upon  the  data  collected, 
these  may  be  about  the  only  metrics  the 
team  can  develop.  One  drawback  is  that 
you  do  not  necessarily  know  up  front  if 
the  causes  of  the  defects  are  random  in 
nature  or  attributable  to  specific  causes. 

Based  upon  the  software  style  guide 
rework  costs  shown  in  Figure  3,  I  rec¬ 


Sample  (X)  Chart  for  All  Defects 


Reviews  for  the  Last  12  months 

Review  **'UNPL  -** LNPL 


Figure  4:  XmR  Sample  Run  Chart  for  All  Defects 


ommended  that  the  ESEPG  first  con¬ 
sider  a  variety  of  training  options  to 
reduce  the  style  guide  defects.  The  cor¬ 
rective  actions  for  these  defects  could 
range  from  creating  a  heightened  aware¬ 
ness  (such  as  a  team  staff  meeting)  of 
the  need  to  follow  the  style  guide,  to 
providing  the  team  with  formalized 
training  on  it.  The  cost  of  implementing 
each  of  the  proposed  solutions  can  be 
calculated,  the  annual  rework  costs  are 
known,  and  based  upon  the  perceived 
success  of  the  proposed  solutions,  the 
defect  prevention  team  can  determine 
the  appropriate  corrective  action  plan. 

SPC 

The  benefits  of  applying  SPC  tech¬ 
niques  as  a  project  management  tool  are 
that  they  may  help  identify  problems 
that  could  remain  hidden  by  other  quan¬ 
titative  analysis  methodologies.  The  cal¬ 


culations  are  a  little  more  complex,  but 
once  you  set  up  your  calculations  in 
something  like  a  spreadsheet  file  then 
the  file  can  easily  be  changed  for  the 
new  set  of  data. 

The  results  of  this  analysis  led  to  a 
decision  that  every  program  manager 
will  probably  have  to  make  sometime  in 
his  or  her  career.  The  proper  corrective 
action  was  obvious,  but  at  first  it  was 
not  well  received  by  the  customer.  After 
determining  the  process  capability,  I 
calculated  a  new  baseline  for  the  project 
and  presented  the  new  baseline  to  the 
customer.  My  analysis  included  the  neg¬ 
ative  quality  impacts  experienced  from 
trying  to  bring  the  project  back  on 
schedule  and  the  argument  that  the  new 
baseline  would  reduce  life-cycle  costs 
by  providing  the  customer  with  higher 
quality  products.  The  damage  repair  in 
customer  satisfaction  took  many 


Figure  5:  XmR  Sample  Chart  for  the  Quantity  of  Defects  Found  in  the  engineering  Documentation 
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Figure  6:  Moving  Window  of  the  Probability  of  at  Feast  One  Defect,  or  No  Defects,  Being  Found 


Notes 

1.  One  anomaly  (reference  Figure  4) 
plus  an  additional  18  anomalies  iden¬ 
tified  by  peeling  back  the  data 
equates  to  a  total  of  19  anomalies. 

2.  The  project  accounted  for  only  5  per¬ 
cent  of  the  workload  within  the 
branch,  yet  100  percent  of  the  defect 
anomalies  pointed  to  that  one  project. 
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months  to  achieve,  but  the  last  feedback 
that  I  received  was  that  customer  satis¬ 
faction  did  improve  over  time.  The 
team  met  the  re-baselined  plan  and  pro¬ 
vided  the  customer  with  a  higher  quali¬ 
ty  product. 

What  Next? 

All  of  the  charts  discussed  in  this  article 
provide  a  historical  view  of  process 
activities.  Displaying  the  data  in  a  man¬ 
ner  that  shows  trends  may  enable  man¬ 
agement  to  move  from  reactive  manage¬ 
ment  toward  proactive  management 
activities.  I  explored  a  variety  of  options 
for  trying  to  watch  for  trends  in  the 
quality.  One  option  that  seemed  to  give 
some  insight  into  the  process  was  to 
show  the  trend  of  the  probability  of  the 
chance  of  one  or  more  defects  being 
found;  for  each  peer  review  I  set  a 
yes/no  flag  to  indicate  whether  any 
defects  of  that  nature  occurred.  I  estab¬ 
lished  the  probability  calculation  based 
upon  the  sum  of  defects  found  in  the 
last  50  peer  reviews.  By  using  the  infor¬ 
mation  from  the  last  50  reviews,  I  was 
able  to  develop  a  chart  with  a  moving 
window  (last  50)  that  would  show  a 
trend  in  the  data. 

I  chose  to  use  the  last  50  reviews  for 
two  reasons.  First,  it  was  large  enough 
to  give  a  fair  representation  of  the  prob¬ 
ability  of  the  defect  occurring  in  the 
product.  The  second  reason  was  that 
even  with  using  a  sample  size  of  50,  the 
time  period  spanning  the  reviews  was 
less  than  a  year.  Figure  6  shows  the 
trends  for  two  of  the  defect  types;  the 
undesirable  trends  include  the  increas¬ 
ing  probability  of  finding  style  guide 
and  typographical  defects.  Smaller 
improvements  in  other  defect  types 


added  up  to  a  noticeable  improvement 
trend  in  the  probability  of  not  finding 
any  defects.  The  probability  of  not  find¬ 
ing  any  defects  was  promising  but  the 
undesirable  trends  again  reinforced  a 
need  to  take  action  to  reduce  the  style 
guide  and  typographical  defects. 

Conclusion 

The  three  attributes  of  the  product 
being  developed  are  cost,  schedule,  and 
quality.  When  projects  fall  behind 
schedule  and/or  over-budget,  then 
efforts  are  made  to  bring  the  project 
back  on  track,  but  it  is  undesirable  to  do 
this  at  the  expense  of  quality.  Applying 
the  SPC  concepts  to  the  process 
revealed  that  our  current  course  of 
action  on  one  project  risked  delivering 
poor-quality  products  to  the  customer. 
In  this  case,  the  application  of  the  SPC 
concepts  enabled  us  to  change  our 
course  of  action  to  improve  the  quality 
of  the  products  delivered  to  the  cus¬ 
tomer. 

As  shown  earlier,  a  lot  of  knowledge 
can  be  gained  by  a  careful  analysis  of 
the  data.  By  carefully  analyzing  the  data 
and  comparing  the  perceived  benefits 
versus  the  costs,  the  defect  prevention 
teams  can  select  activities  that  provide 
the  best  return  on  investment. 

Final  Note 

You  may  find  automated  charts  to  be 
one  of  your  greatest  assets,  but  they  can 
also  be  one  of  your  greatest  liabilities. 
The  person  that  extracts  the  data,  per¬ 
forms  the  calculations,  and  builds  the 
charts  seems  to  have  a  much  better 
understanding  of  the  data  behind  the 
chart  than  does  the  person  that  gets  the 
charts  from  an  automated  process.^ 
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Broken  Windows 


BackTalk 


In  1969,  Stanford  University  psychologist  Philip  Zimbardo  con¬ 
ducted  an  experiment  on  human  nature.  He  abandoned  two 
similar  cars  in  different  neighborhoods  —  one  in  the  heart  of  the 
Bronx,  N.Y.,  the  other  in  an  affluent  neighborhood  in  Palo  Alto, 
Calif.  He  removed  the  license  plates,  left  the  hoods  open,  and 
chronicled  what  happened. 

In  the  Bronx,  within  10  minutes  of  abandonment,  people 
began  stealing  parts  from  the  alluring  car.  It  took  approximately 
three  days  to  strip  the  car  of  all  valuable  parts.  Once  stripped  of 
economic  value,  the  car  then  became  a  source  of  entertainment. 
People  smashed  windows,  ripped  upholstery,  and  chipped  the 
paint  —  reducing  the  car  to  a  pile  of  junk. 

In  Palo  Alto,  something  quite  different  happened  —  nothing. 
For  more  than  a  week,  the  car  sat 
unmolested.  There  was  no  theft, 
vandalism,  or  even  a  scratch. 

Puzzled,  Zimbardo,  in  plain  view 
of  everyone,  took  a  sledgehammer 
and  smashed  part  of  the  car.  Soon 
passersby  were  taking  turns  with 
the  hammer,  delivering  blow  after 
satisfying  blow.  Within  a  few 
hours,  the  vehicle  was  resting  on 
its  roof,  demolished. 

Among  the  scholars  who  took 
note  of  Zimbardo’s  experiment 
were  two  criminologists:  James  Q. 

Wilson  and  George  Kelling.  The 
experiment  spurred  their  now 
famous  broken  windows  theory  of  crime.  Their  premise  is  that  if  a 
broken  window  remains  unrepaired,  vandals  will  soon  break  a 
building’s  remaining  windows. 

Why  is  that?  Aside  from  the  fact  that  it  is  fun  to  break  win¬ 
dows,  why  does  the  broken  window  invite  further  vandalism? 
Wilson  and  Kelling’s  hypothesis  is  the  broken  window  sends  a 
signal  that  no  one  is  in  charge,  breaking  more  windows  costs 
nothing,  and  there  are  no  consequences  to  breaking  more  win¬ 
dows. 

The  broken  window  is  a  metaphor  for  ways  behavioral  norms 
break  down  in  a  community.  If  one  person  scrawls  graffiti  on  the 
wall,  others  will  soon  be  spraying  paint.  If  one  aggressive  pan¬ 
handler  begins  working  a  street  block,  others  will  follow.  In  short, 
once  people  begin  disregarding  norms  that  keep  order  in  a  com¬ 
munity,  both  order  and  community  unravel. 

Police  in  big  cities  have  dramatically  reduced  crime  rates  by 
applying  this  theory.  Rather  than  concentrating  on  felonies,  they 
aggressively  enforce  minor  offenses  like  graffiti,  public  drinking, 
panhandling,  and  littering.  This  police  enforcement  sends  a  sig¬ 
nal  that  broken-window  behavior  has  consequences  in  a  city.  If 
you  cannot  get  away  with  jumping  a  turnstile  in  the  subway,  you 
had  better  not  try  armed  robbery. 

At  this  point,  you  are  wondering  what  crime  in  the  streets  has 
to  do  with  software  development.  The  broken  window  theory 
plays  out  in  software  development  organizations  daily.  Software 
managers  inadvertently  send  signals  that  no  one  is  in  charge  and 
there  are  no  costs  or  consequences  to  ignoring  project  norms. 
Before  you  say  “not  on  my  project,”  you  might  want  to  look  for 
some  classical  broken  windows  in  your  organization. 

Problems  arise  when  managers  allow  prima  donnas  to  domi¬ 


nate,  intimidate,  and  dictate  projects.  It  is  tempting  to  let  a  tech¬ 
nical  superstar  take  the  lead,  especially  for  managers  who  ques¬ 
tion  their  own  engineering  talent,  but  they  will  pay  in  the  end. 
Once  ideas  are  stifled  and  insults  start  flying,  team  members  will 
opt  out  or  limit  their  contribution  to  the  project.  The  prima 
donna  will  get  overloaded  and  then  the  vandalism  will  begin. 
Broken  stained  glass  is  still  broken  glass.  Do  you  cultivate  sages 
who  are  inclusive  and  teach  their  craft,  or  prima  donnas  who  hide 
their  weaknesses  and  feed  their  insecurities? 

Do  you  have  managers  whose  directions  are  clear  as  mud? 
Like  the  opaque  window  in  a  bathroom,  they  appear  to  shed  light 
on  the  subject  but  in  reality,  things  are  not  that  bright  or  clear. 
After  a  while,  some  engineers  enjoy  these  opaque  managers 

because  if  directions  are  not  clear 
then  accountability  is  not  clear.  If 
accountability  is  not  clear,  then 
this  project  is  a  free  for  all,  so  start 
breaking  the  windows.  Are  you 
blocking  the  light  or  letting  the 
sunshine  in? 

Troubles  occur  when  man¬ 
agers  exert  their  authority  by 
hoarding  information  and  tighten¬ 
ing  control.  Collaboration  and  ini¬ 
tiative  are  dirty  words  to  these 
comptrollers.  Everything  runs  on 
maximum  management  sanction 
and  minimum  information  shar¬ 
ing.  Processes  stall  or  wander, 
engineers  revert  to  cruise  control,  and  information  flows  like 
Molly  Brown  through  a  portal  window.  Do  you  lead,  manage,  or 
choke  your  projects? 

Then  there  are  indecisive  managers,  the  sliding  glass  doors  of 
management.  People  are  enamored  with  sliding  glass  doors  until 
they  own  one.  Then  you  discover  the  door  is  always  open  when 
you  want  it  closed  and  kids  are  constantly  running  into  it  when 
closed,  thinking  it  is  open.  Like  a  sliding  glass  door,  you  never 
seem  to  be  accordant  with  indecisive  managers.  They  never  pro¬ 
vide  direction  and  avoid  decisions  until  you  make  a  move,  then 
there  they  are  —  blocking  progress  or  letting  the  air  out  of  your 
project.  Are  you  indecisive?  Need  more  time  to  think  about  it? 

Space  and  time  is  running  out  so  we  will  have  to  discuss  the 
skylight  manager,  triple-pane  glass  manager,  tinted  window  man¬ 
ager,  two-way  mirror  manager,  and  the  cockpit  canopy  manager 
another  time. 

The  point  is,  once  managers  begin  disregarding  norms  that 
keep  order  in  a  project,  both  order  and  the  project  unravel. 
Repair  the  broken  windows  in  your  management  style  and  order 
will  return. 

Amazingly,  I  think  Wilson  and  Kelling’s  theory  may  explain 
the  mystery  of  software  quality.  From  its  first  release  to  present 
versions,  Microsoft  Windows  was  released  broken.  Distributing 
broken  Windows  sends  a  signal  that  no  one  is  in  charge,  there  are 
no  consequences,  and  breaking  more  Windows  software  is  okay. 
Software  norms  break  down  and  our  systems  vandalized  —  all 
from  broken  windows. 

— Gary  Petersen 
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